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Abstract Operator splitting schemes have been successfully used in computational sciences to reduce com¬ 
plex problems into a series of simpler subproblems. Since 1950s, these schemes have been widely used to 
solve problems in PDE and control. Recently, large-scale optimization problems in machine learning, signal 
processing, and imaging have created a resurgence of interest in operator-splitting based algorithms because 
they often have simple descriptions, are easy to code, and have (nearly) state-of-the-art performance for 
large-scale optimization problems. Although operator splitting techniques were introduced over 60 years 
ago, their importance has significantly increased in the past decade. 

This paper introduces a new operator-splitting scheme for solving a variety of problems that are reduced 
to a monotone inclusion of three operators, one of which is cocoercive. Our scheme is very simple, and 
it does not reduce to any existing splitting schemes. Our scheme recovers the existing forward-backward, 
Douglas-Rachford, and forward-Douglas-Rachford splitting schemes as special cases. 

Our new splitting scheme leads to a set of new and simple algorithms for a variety of other problems, 
including the 3-set split feasibility problems, 3-objective minimization problems, and doubly and multiple 
regularization problems, as well as the simplest extension of the classic ADMM from 2 to 3 blocks of variables. 
In addition to the basic scheme, we introduce several modifications and enhancements that can improve the 
convergence rate in practice, including an acceleration that achieves the optimal rate of convergence for 
strongly monotone inclusions. Finally, we evaluate the algorithm on several applications. 


1 Introduction 

Operator splitting schemes reduce complex problems built from simple pieces into a series smaller subprob¬ 
lems which can be solved sequentially or in parallel. Since the 1950s they have been successfully applied 
to problems in PDE and control, but recent large-scale applications in machine learning, signal processing, 
and imaging have created a resurgence of interest in operator-splitting based algorithms. These algorithms 
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often have very simple descriptions, are straightforward to implement on computers, and have (nearly) state- 
of-the-art performance for large-scale optimization problems. Although operator splitting techniques were 
introduced over 60 years ago, their importance has significantly increased in the past decade. 

This paper introduces a new operator-splitting scheme, which solves nonsmooth optimization problems 
of many different forms, as well as monotone inclusions. In an abstract form, this new splitting scheme will 

find xCH such that 0 £ Ax + Bx + Cx (1.1) 

for three maximal monotone operators A, B, C defined on a Hilbert space H, where the operator C is 
cocoercive. 1 

The most straightforward example of (1.1) arises from the optimization problem 

minimize /( x) + g[x) + h(x), ( 1 . 2 ) 


where /, g, and h are proper, closed, and convex functions and h is Lipschitz differentiable. The first-order 
optimality condition of (1.2) reduces to (1.1) with Ax = df(x), Bx = dg(x), and Cx = V/i(:r), where df, dg 
are subdifferentials of / and g, respectively. Note that C is cocoercive because h is Lipschitz differentiable. 

A number of other examples of (1.1) can be found in Section 2 including split feasibility, doubly regular¬ 
ized, and monotropic programming problems, which have surprisingly many applications. 

To introduce our splitting scheme, let I B denote the identify map in TL and '■= (I + S ) -1 denote the 
resolvent of a monotone operator S. (When S = df, Js(x) reduces to the proximal map: argmin y f(y) + 
I\\x — t/|| 2 .) Let 7 £ (0, 2/3) be a scalar. Our splitting scheme for solving (1.1) is summarized by the operator 


T .— I B J*yB T J^A O ( 2.L ,B I'M, ^)C O J^ B )■ 


(1.3) 


Calculating Tx requires evaluating J 1 a , J-yB, and C only once each, though J 1 b appears three times in T. 
In addition, we will show that a fixed-point of T encodes a solution to (1.1) and T is an averaged operator. 
The problem (1.1) can be solved by iterating 

z k+1 := (l~\ k )z k + \ k Tz k , (1.4) 


where z° is an arbitrary point and X k £ (0, (4/3 — y)/2/3) is a relaxation parameter. (For simplicity, one can 
fix 7 < 2/3 and A*, = 1.) This iteration can be implemented as follows: 

Algorithm 1 Set an arbitrary point z° £ Ft, stepsize 7 £ (0, 2/3), and relaxation sequence (Xj)j>o £ (0, (4/3— 
7 )/2/3). For k = 0, 1, ..., iterate: 

1. get x k B = J lB (z k ); 

2. get x\ = J 7 a( 2x b — z k — 7 Cx k B ); / /comment: x k A = J 1 a 0 (2J 7 b — In ~ o J lB )z k 

3. get z k+1 = z k + A k{x k A — x B ); //comment: z k+1 = (1 — A k)z k + A kTz k 

Algorithm 1 leads to new algorithms for a large number of applications, which are given in Section 2 below. 
Although some of those applications can be solved by other splitting methods, for example, by the alter¬ 
nating directions method of multipliers (ADMM), our new algorithms are typically simpler, use fewer or no 
additional variables, and take advantage of the differentiability of smooth terms in the objective function. 
The dual form of our algorithm is the simplest extension of ADMM from the classic two-block form to the 
three-block form that has a general convergence result. The details of these are given in Section 2. 

The full convergence result for Algorithm 1 is stated in Theorem 3.1. For brevity we include the following 
simpler version here: 


1 An operator C is 7-cocoercive (or /3-inverse-strongly monotone), 0 > 0, if {Cx — Cy,x — y) > 0\\Cx — Cy || 2 , V.r , y S 'H. 

This property generalizes many others. In particular, V/i of an L-Lipschitz differentiable convex function h is 1/L-cocoercive. 
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Theorem 1.1 (Convergence of Algorithm 1) Suppose thatFixT / 0. Let a = 2/3/(4/3 — 7 ) and suppose 
that (Xj)j>o satisfies ^/^(l —Aj/o^Aj/a = 00 (which is true if the sequence is strictly bounded away from 0 
and 1 /a). Then the sequences (z 3 )j>o, ( x J B )j>o, and (x A )j> 0 generated by Algorithm 1 satisfy the following: 

1. (zb)j>o converges weakly to a fixed point ofT; and 

2. 0 and (x A )j> 0 converge weakly to an element of zei(A + B + C). 


1.1 Existing two-operator splitting schemes 

A large variety of recent algorithms [12,26,37] and their generalizations and enhancements [4,7,6,8,16,17, 
18,20,28,39] are (skillful) applications of one of the following three operator-splitting schemes: (i) forward- 
backward-forward splitting (FBFS) [38], (ii) forward-backward splitting (FBS) [36], and (iii) Douglas-Racliford 
splitting (DRS) [32], which all split the sum of two operators. (The recently introduced forward-Douglas- 
Rachford splitting (FDRS) turns out to be a special case of FBS applied to a suitable monotone inclusion [23, 
Section 7].) Until now, these algorithms are the only basic operator-splitting schemes for monotone inclu¬ 
sions, if we ignore variants involving inertial dynamics, special metrics, Bregman divergences, or different 
stepsize choices 2 . To our knowledge, no new splitting schemes have been proposed since the introduction of 
FBFS in 2000. 

The proposed splitting scheme T in Equation (1.3) is the first algorithm to split the sum of three operators 
that does not appear to reduce to any of the existing schemes. In fact, FBS, DRS, and FDRS are special 
cases of Algorithm 1. 

Proposition 1.1 (Existing operator splitting schemes as special cases) 

1. Consider the forward-backward splitting (FBS) operator [36], Tfbs := J-yA 0 (In ~~ 7 C), for solving 
0 £ Ax + Cx where A is maximal monotone and C is cocoercive. If we set B = 0 in (1-3), then T = Tfbs- 

2. Consider the Douglas-Rachford splitting (DRS) operator [32], Tors := In ~ JjB + J-yA 0 (2 J 1 b — In), for 
solving 0 € Ax + Bx where A, B are maximal monotone. If wet set C = 0 in (1.3), then T = Tors- 

3. Consider the forward-Douglas-Rachford splitting (FDRS) operator [9], Tfdrs := In ~ Pv + J-yA 0 (2 Py — 
In ~ 7 Pv 0 C' o Py), for solving 0 £ Ax + C'x + Nyx where A is maximal monotone, C' is cocoercive, 
V is a closed vector space, Ny is the normal cone operator of V, and Pv denote the projection to V. If 
we set B = Ny and C = Py oC'o Py in (1.3), then T = Tfdrs- 

The operator T is also related to the Peaceman-Rachford splitting (PRS) operator [32]. Let us introduce 
the “reflection” operator refl^ := 2 J A — where A : TL —> TL is a maximal monotone operator, and set 

S := 2T — I = refl 7 A o (refl 7 s — 7 C o J lB ) — 7 C o J lB . (1.5) 

If we set C = 0, then S reduces to the PRS operator. 


1.2 Convergence rate guarantees 

We show in Lemma 3.2 that from any fixed point z* of the operator T, we obtain x* := J^ B (z*) as a zero of 
the monotone inclusion (1.1), i.e., x* £ zer(A -f B + C). In addition, under various scenarios, the following 
convergence rates can be deduced: 


2 For example, Peaceman-Rachford splitting (PRS) [32] doubles the step size in DRS. 
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1. Fixed-point residual (FPR) rate: The FPR \\Tz k — z k || 2 has the sharp rate o (l/\/k + l). (Part 7 of 
Theorem 3.1 and Remark 3.5.) 

2. Function value rate: Under mild conditions on Problem (1.2), although (/ + g + h)(x k ) — (/ + g + 
h)(x*) is not monotonic, it is bounded by o (l /y/k + l). Two averaging procedures improve this rate to 
O (1 /{k + 1)). The running best sequence, min, =0l ... ,k(f + g + h)(x' 1 ) — (/ + g + h)(x*), further improves 
to o(l/(k + 1)) whenever / is differentiable and V/ is Lipschitz continuous. These rates are also sharp. 

3. Strong convergence: When A (respectively B or C) is strongly monotone, the sequence \\x\ — x*\\ 2 
(respectively ||— x*|| 2 ) converges with rate o(l/y/k + 1). The running best and averaged sequences 
improve this rate to o(l/(k + 1)) and 0(1/(A: + 1)), respectively. 

4. Linear convergence: We reserve // £ [0, oo) for strong monotonicity constants and L £ (0, oo] for 
Lipschitz constants. If strong monotonicity does not hold, then fi = 0. If Lipschitz continuity does not 
hold, then L = oo. Algorithm 1 converges linearly whenever (/x A + Ms + 9’C)0-/La + 1/ Lb ) > 0, i.e., 
whenever at least one of A, B , or C is strongly monotone and at least one of A or B is Lipschitz 
continuous. We present a counterexample where A and B are not Lipschitz continuous and Algorithm 1 
fails to converge linearly. 

5. Variational inequality convergence rate: We can apply Algorithm 1 to primal-dual optimality con¬ 
ditions and other structured monotone inclusions with A = A + 9/, B = B + dg and C = C + V/i for 
some monotone operators A, B , and C. A typical example is when A and B are bounded skew linear 
maps and C = 0. Then, the corresponding variational inequality converges with rate o (l /y/k + l) under 
mild conditions on A and /. Again, averaging can improve the rate to 0(1/(k + 1)). 


1.3 Modifications and enhancements of the algorithm 
1.3.1 Averaging 

The averaging strategies in this subsection maintain additional running averages of its sequences (x J A )j>o 
and (x J B )j> o in Algorithm 1. Compared to the worst-case rate o(l/y/k + 1) of the original iterates, the 
running averages have the improved rate of 0(1/(k + 1)), which is referred to as the ergodic rate. This 
better rate, however, is often contradicted by worse practical performance, for the following reasons: (i) In 
many finite dimensional applications, when the iterates reach a solution neighborhood, convergence improves 
from sublinear to linear, but the ergodic rate typically stays sublinear at 0(1/(k + 1 )); (ii) structures such 
as sparsity and low-rankness in current iterates often get lost when they are averaged with all their past 
iterates. This effect is dramatic in sparse optimization because the average of many sparse vectors can be 
dense. 

The following averaging scheme is typically used in the literature for splitting schemes [24,22,4]: 


r k — 

Xto - 


E k 

i =i 




i=0 ^ i=0 


i.X i 


and 


r k 

X A 


k 


E k 

i =i 


i= 0 i =0 


^i x A 5 


( 1 . 6 ) 


where all Aj, x l A . and x l B are given by Algorithm 1 . By maintaining the running averages in Algorithm 1, x 1 ^ 
and x k A are essentially costless to compute. 
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The following averaging scheme, inspired by [34], uses a constant sequence of relaxation parameters A t 
but it gives more weight to the later iterates: 

4c (t + lKt + 2) g (i + 1)4 a ”' 1 ^ = (fc+ 1)^ + 2) Z> + 1 K«- <1T 

This seems intuitively better: the older iterates should matter less than the current iterates. The above ergodic 
iterates are closer to the current iterate, but they maintain the improved convergence rate of 0(1/(k + 1 )). 
Like before, x k B and x\ can be computed by updating x k B l and 5 ^ _1 at little cost. 

1.3.2 Some accelerations 


In this section we introduce an acceleration of Algorithm 1 that applies whenever B or C is strongly monotone. 
If / is strongly convex, then S = df is strongly monotone. Instead of fixing the step size 7 , a varying sequence 
of stepsizes ( 7 j)j>o are used for acceleration. The acceleration is significant on problems where Algorithm 1 
works nearly at its performance lower bound and the strong convexity constants are easy to obtain. The 
new algorithm is presented in variables different from those in Algorithm 1 since the change from 7 ^ to 7^+1 
occurs in the middle of each iteration of Algorithm 1, right after J jB is applied. In case that 7 /- = 7 is fixed, 
the new algorithm reduces to Algorithm 1 with a constant relaxation parameter A*, = 1 via the change of 
variable: z k = x k A x + 7 fc_iu^~ 1 . The new algorithm is as follows: 

Algorithm 2 (Algorithm 1 with acceleration) Choose z° £ TL and stepsizes (jj)j>o £ (0,oo). Let x u 4 £ 
TL and set x B = J^ 0 b(x° a ),u° b = (I /70 )(I — J^b)(x[ 4 ). For k = 1,2,..., iterate 

1. get x k B = J 7 b(^ 1 _1 + 7 fc_iu| _1 ); 

2. get u k B = (l/ r ) k ^i)(x k A 1 + 'y k -iu B ~ 1 - x k B )-, 

3. get x k A = J lk A(x k B - j k u% - j k Cx k B ); 

The sequence of stepsizes which are related to [12, Algorithm 2] and [5, Algorithm 5], are introduced 

in Theorem 1.2. These stepsizes improve the convergence rate of ||a;^ — a ;*|| 2 to 0(l/(k + l) 2 ). 

Theorem 1.2 (Accelerated variants of Algorithm 1) Let B be ps-strongly monotone, where we allow 
the case n B = 0. 

1. Suppose that C is (3-cocoercive and p,c~strongly monotone. Let p £ (0,1) and choose 70 £ (0, 2/3(1 — rf)). 
In algorithm 2, for all k > 0, let 


7fc+i 


-Zllh-cV + \J (27 Ih-cv ) 2 + 4(1 + 27 ^ 5 ) 7 2 

2(1 + 2^/ k p B ) 


( 1 . 8 ) 


Then we have \\x% — x *|| 2 = 0(1/(k + l) 2 ). 

2. Suppose that C is Lc-Lipschitz, but not necessarily strongly monotone or cocoercive. Suppose that /. i B > 0. 
Let 70 £ (0, 2p B /L" q). In algorithm 2, for all k > 0, let 


, = _ Tk _ 

\/l + 27fc(/7B - 7 kL 2 c /2) 


(1.9) 


Then we have \\x k B — x *|| 2 = 0(1/(k + l) 2 ). 
The proof can be found in Appendix A. 
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1.4 Practical implementation issues: Line search 

Recall that /3, the cocoercivity constant of C, determines the stepsize condition 7 £ (0, 2/3) for Algorithm 1 . 
When /3 is unknown, one can find 7 by trial and error. Whenever the FPR is observed to increase (which 
does not happen if 7 £ (0, 2/3) by Part 2 of Theorem 3.1), reduce 7 and restart the algorithm from the initial 
or last iterate. 

For the case of C = V/). for some convex function h with Lipschitz V/i, we propose a line search procedure 
that uses a fixed stepsize 7 but involves an auxiliary factor p £ (0,1]. It works better than the above approach 
of changing 7 since the latter changes fixed point. Let 

refl 7B : = (! + P) J iB - pin- 

Note that refl(, B = refl 7B and refl 7B = J lB . Define 

T£ = In~ JjB + J P1 A ° (refl^ B - pqV/i o J lB ). 

Our line search procedure iterates z k+1 = T£(z fe ) with a special choice of p: 

Algorithm 3 (Algorithm 1 with line search) Choose z° £ R and 7 £ (0, 00 ). For k = 0,1,..., iterate 

1. get x k B = J lB (z k ); 

2. get p £ ( 0 , 1 ] such that 

h{x k A ) < h{x k B ) + (4 - 4, Vh(4)> + 2 ^ 114 - 4ll 2 

where 

4 = J~ipA{x k B + p{x k B - z k ) - 7 pVh(x k B )); 

3. get z k+1 = z k +x k A -x k B . 

A straightforward calculation shows the following lemma: 

Lemma 1.1 For all p £ (0, 1) and all 7 > 0, we have 

zer (A + B + V/i) = J 7 s(Fix4) and Fi x(T^) = Fix^ 1 ). 

Remark 1.1 In practice, Algorithm 3, which can start with a larger 7 , can be an order of magnitude faster 
than Algorithm 1. Unfortunately, we have no proof of convergence for this method. 


1.5 Definitions, notation and some facts 

In what follows, H denotes a (possibly infinite dimensional) Hilbert space. We use ( , ) to denote the inner 
product associated to a Hilbert space. In all of the algorithms we consider, we utilize two stepsize sequences: 
the implicit sequence U R.++ and the explicit sequence (A j)j>o C R ++ . 

The following definitions and facts are mostly standard and can be found in [3]. 

Let L > 0, and let D be a nonempty subset of H. A map T : D — > H is called L-Lipschitz if for all 
x, y £ H, we have || Tx — Ty\\ < L\\x — y||. In particular, N is called nonexpansive if it is 1-Lipschitz. A map 
N : D —> H is called A-averaged [3, Section 4.4] if it can be written as 

N = T x := (1 - A)J« + AT 


( 1 . 10 ) 
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for a nonexpansive map T : D —► TL and a real number A £ (0,1). A (l/2)-averaged map is called firmly 
nonexpansive. We use a * superscript to denote a fixed point of a nonexpansive map, e.g., z*. 

Let 2 n denote the power set of TL. A set-valued operator A : TL —► 2 n is called monotone if for all 
x, y £ TL, u £ Ax, and v £ Ay, we have (x — y,u — v) > 0. We denote the set of zeros of a monotone operator 
by zer(A) := {x £ TL | 0 £ Ax}. The graph of A is denoted by gra(A) := {(x,y) \ x £ H,y £ Ax}. Evidently, 
A is uniquely determined by its graph. A monotone operator A is called maximal monotone provided that 
gra(A) is not properly contained in the graph of any other monotone set-valued operator. The inverse of 
A, denoted by A -1 , is defined uniquely by its graph gra(A _1 ) := {(y,x) | x £ TL,y £ Ax}. Let /3 £ R be a 
positive real number. The operator A is called /3-strongly monotone provided that for all x,y £ TL, u £ Ax, 
and v £ Ay, we have (x — y,u — v) > f3\\x — y || 2 . A single-valued operator B : TL —> 2 n maps each point in TL 
to a singleton and will be identified with the natural 'H-valued map it defines. The resolvent of a monotone 
operator A is defined by the inversion Ja '■= (/ + A) -1 . Minty’s theorem shows that Ja is single-valued and 
has full domain TL if, and only if, A is maximally monotone. Note that A is monotone if, and only if, Ja is 
firmly nonexpansive. Thus, the reflection operator 

ref 1 A ■■= 2 J A - I H (1.11) 

is nonexpansive on TL whenever A is maximally monotone. 

Let / : TL —► (— 00 , 00 ] denote a closed (i.e., lower semi-continuous), proper, and convex function. Let 
dom(/) := {x £ TL \ f{x) < 00 }. We let df(x) : TL —> 2 n denote the subdifferential of /: df{x) := {u £ TL \ 
\/y £ TL, f(y) > f(x) + (y — x, it)}. We always let 

V/(x) £ df{x) 

denote a subgradient of / drawn at the point x. The subdifferential operator of / is maximally monotone. 
The inverse of df is given by df* where f*(y) := sup xgW (i/,x) — f(x) is the Fenchel conjugate of /. If the 
function / is /3-strongly convex, then df is /3-strongly monotone and df* is single-valued and /3-cocoercive. 

If a convex function / : TL —> (— 00 , 00 ] is Frechet differentiable at x £ TL, then df(x) = {V/(x)}. 
Suppose / is convex and Frechet differentiable on TL, and let /3 £ R be a positive real number. Then the 
Baillon-Haddad theorem states that V/ is (l//3)-Lipschitz if, and only if, V/ is /3-cocoercive. 

The resolvent operator associated to df is called the proximal operator and is uniquely defined by the 
following (strongly convex) minimization problem: prox^(x) := Jdf(x) = argmin ygW f(y) + (l/2)||y — x|| 2 . 
The indicator function of a closed, convex set C C TL is denoted by be ■ 'LL —» {0, 00 }; the indicator function 
is 0 on C and is 00 on TL\C. The normal cone operator of C is the monotone operator Nq := die- 

Finally, we call the following identity the cosine rule : 

\\y - z\\ 2 + 2(y - x,z - x) = \\y - x\\ 2 + \\z-x\\ 2 , Vx,y,z GTL. (1.12) 


2 Motivation and Applications 

Our splitting scheme provides simple numerical solutions to a large number of problems that appear in signal 
processing, machine learning, and statistics. In this section, we provide some concrete problems that reduce 
to the monotone inclusion problem (1.1). These are a small fraction of the problems to which our algorithm 
will apply. For example, when a problem has four or more blocks, we can reduce it to three or fewer blocks 
by grouping similar components or lifting the problem to a higher-dimensional space. 
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For every method, we list the three monotone operators A, £?, and C from problem (1.1), and a minimal 
list of conditions needed to guarantee convergence. 

We do not include any examples with only one or two blocks they can be solved by existing splitting 
algorithms that are special cases of our algorithm. 


2.1 The 3-set (split) feasibility problem 


This problem is to find 


a; <E Ci nc 2 nc 3 , (2.1) 

where Ci,C 2 ,C 3 are three nonempty convex sets and the projection to each set can be computed numerically. 
The more general 3-set split feasibility problem is to find 


x £ Ci D C 2 such that Lx £ C 3 , 
where L is a linear mapping. We can reformulate the problem as 

minimize -d 2 {Lx,C/) subject to x £ Ci nC 2 , 

x 2 


( 2 . 2 ) 


(2.3) 


where d{Lx,C/) := ||Lx — Pc 3 {Lx)\\ and Pc 3 denotes the projection to C 3 . Problem (2.2) has a solution if 
and only if problem (2.3) has a solution that gives 0 objective value. 

The following algorithm is an instance of Algorithm 1 applied with the monotone operators: 

Ax:=N Cl {x)\ Bx:=N C2 {x); Cx := V x ^d 2 (Lx, C 3 ) = L*(Lx - P C3 (Lx)). 


Algorithm 4 (3-set split feasibility algorithm) Set an arbitrary z° £ 7~L, stepsize 7 £ (0, 2/||L|| 2 ), and 
sequence of relaxation parameters (Xj)j>o £ (0, 2 — 7 ||L|| 2 / 2 ). For k = 0,1,.. ., iterate 

1. get x k = Pc 2 {z k ); 

2. get y k = Lx k ; 

3. get = 2x k - z k - 7 L*(y k - Pc 3 {y k )); //comment: z k+ i = ( 2 J lB - I H - lC o J lB )z k 

4. get z k+1 = z k + \ k (P Cl {z k+ ?) -x k ). 

Note that the algorithm only explicitly applies L and L*, the adjoint of L, and does not need to invert 
a map involving L or L*. The stepsize rule 7 £ (0, 2/||L|| 2 ) follows because \7 x ^d 2 (x 1 C3) is 1-Lipschitz [3, 
Corollary 12.30]. 


2.2 The 3-objective minimization problem 
The problem is to find a solution to 

minimize f(x)+g(x)+h(Lx), (2-4) 

X 

where f,g,h are proper closed convex functions, h is (l//3)-Lipschitz-differentiable, and L is a linear 
mapping. Note that any constraint x £ C can be written as the indicator function Leix) and incorporated in 
/ or g. Therefore, the problem (2.3) is a special case of (2.4). 
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The following algorithm is an instance of Algorithm 1 applied with the monotone operators: 

A = df; B = dg ; C = V(/i o L) = L* o V/i o L. 

Algorithm 5 (for problem (2.4)) Set an arbitrary z°, stepsize 7 £ (0, 2/( ( 5||L|| 2 )), and sequence of relax¬ 
ation parameters (Xj)j>o £ (0, 2 — 7 / 3 ||L|| 2 / 2 ). For k = 0, 1 ,.. iterate 

1. get x k = pro x^ g (z k ); 

2. get y k = Lx k ; 

3. get z k+ s = 2x k — z k — 7 L*Vh(y k ); //comment: z k+ s = (2 J 1 b — In — 7 C o J 1 s)z k 

get z fc+1 = + Afc(prox 7 y(z fc+ s) — x k ). 

2.2.1 Application: double-regularization and multi-regularization 


Regularization helps recover a signal with the structures that are either known a priori or sought after. 
In practice, regularization is often enforced through nonsmooth objective functions, such as £1 and nuclear 
norms, or constraints, such as nonnegativity, bound, linear, and norm constraints. Many problems involve 
more than one regularization term (counting both objective functions and constraints), in order to reduce 
the “search space” and more accurately shape their solutions. Such problems have the general form 


m 


minimize 

xen 


'^r/x) + h 0 (Lx), 
2=1 


(2.5) 


where r, are possibly-nonsmooth regularization functions and ho is a Lipschitz differentiable function. When 
m = 1, 2, our algorithms can be directly applied to (2.5) by setting / = r-| and g = r% in Algorithm 5. 

When m > 3, a simple approach is to introduce variables x^, i = 1,... ,m, and apply Algorithm 5 to 
either of the following problems, both of which are equivalent to (2.5): 


minimize 


m 

^ ^ 7*2 (#( 2 )) + ^{rc=a;(i)=-"=a;( rn ,)} *^(1) 5 • • • 1 *^(m)) ~^~^0 (-^3?) 1 

i=l ' s ---' 

/ 


( 2 . 6 ) 


minimize 


m ( 1 \ 

'y ' ( r i{ x (i)) 4 ^0 (Lx(if) | + ( '{x( 1 )= -=X( m )} (®(1) > • ■ • ) x {m)) 

<-1 V m ^" 


(2.7) 
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f+hoL 

where g returns 0 if all the inputs are identical and oo otherwise. Problem (2.6) has a simpler form, but 
problem (2.7) requires fewer variables and will be strongly convex in the product space whenever ho(Lx) is 
strongly convex in x. 

It is easy to adapt Algorithm 5 for problems (2.7) and (2.6). We give the one for problem (2.7): 
Algorithm 6 (for problem (2.7)) Set arbitrary . .., z? m \, stepsize 7 £ (0, 2m/(/3||L|| 2 )), and sequence 
of relaxation parameters {Xj)j>o £ (0, 2 — 7 / 3 ||L|| 2 /(2m)). For k = 0,1,.. iterate 

1. get x k {1) ,..., x\ m) = + . • • + zf m} ); 

2. getz k + 1/2 = 2x k {i) -z^-AL*\/h(Lx k (i) )) and z^ 1 = z k {i) +\ k (pro K jri (z k + 1/2 ) - x^, for i = 1 
in parallel. 


Because Step 1 yields identical x k ^,..., x k m y they can be consolidated to a single x k in both steps. For the 
same reason, splitting ho(L-) into multiple copies does not incur more computation. 
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2.2.2 Application: texture inpainting 

Let y be a color texture image represented as a 3-way tensor where y(:,1), y(:,2), y(:,3) are the red, 
green, and blue channels of the image, respectively. Let Pq be the linear operator that selects the set of 
known entries of y, that is, Pay is given. The inpainting problem is to recover a set of unknown entries of 
y. Because the matrix unfoldings of the texture image y are (nearly) low-rank (as in [33, Equation (4)]), we 
formulate the inpainting problem as 


minimize w||x (1) ||* + w||x (2) ||* + -||P«x - Pay\\ 2 


( 2 . 8 ) 


where x is the 3-way tensor variable, X(i) is the matrix [x(:,:, 1) x(:, :, 2) x(:, :, 3)], X( 2 ) is the matrix [x(:,: 
, 1) T x(:, :, 2) r x(:, :, 3) T ] T , || • ||* denotes matrix nuclear norm, and w is a penalty parameter. Problem (2.8) 
can be solved by Algorithm 5. The proximal mapping of the term || • ||* can be computed by singular value 
soft-thresholding. Our numerical results are given in Section 5.1. 


2.2.3 Matrix completion 

Let Xq £ R mxn kg a ma t r i x with entries that lie in the interval [Z, u], where l < u are positive real numbers. 
Let A be a linear map that “selects” a subset of the entries of an m x n matrix by setting each unknown 
entry in the matrix to 0. We are interested in recovering matrices Xq from the matrix of “known” entries 
A(X o). Mathematically, one approach to solve this problem is as follows [11]: 

minimize \\\A{X - X 0 )|| 2 + p\\X\\* 

xeR mXn 2 

subject to: l < X < u (2.9) 

where p > 0 is a parameter, || • || is the Frobenius norm, and || • ||* is the nuclear norm. Problem (2.9) can 
be solved by Algorithm 5. The proximal operator of || ■ ||* ball can be computed by soft thresholding the 
singular values of X. Our numerical results are given in Section 5.2. 


2.2.4 Application: support vector machine classification and portfolio optimization 
Consider the constrained quadratic program in R d : 

minimize — (Qx,x) + {c,x) (2-10) 

a;GR d 2 

subject to x £ C\ n C 2 

where Q £ R dxd is a symmetric positive semi-definite matrix, c £ R d is a vector, and Ci,C 2 C R d are 
constraint sets. Problem (2.10) arises in the dual form soft-margin kernelized support vector machine classi¬ 
fier [21] in which C\ is a box constraint and C 2 is a linear constraint. It also arises in portfolio optimization 
problems in which C\ is a single linear inequality constraint and C 2 is the standard simplex. See Sections 5.3 
and 5.4 for more details. 
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2.3 Simplest 3-block extension of ADMM 
The 3-block monotropic program has the form 

minimize /i(xi) + / 2 O 2 ) + fz(x 3 ) (2.11a) 

Xi, X2, X3 

subject to L 1 X 1 + L 2 X 3 + L 3 X 3 = b , ( 2 . 11 b) 

where Hi, ■ ■ ■ ,H\ are Hilbert spaces, the vector b £ Hi is given and for i = 1,2,3, the functions /,; : 77, —> 
(— 00 , 00 ] are proper closed convex functions, and Li : Hi —> Hi are linear mappings. As usual, any constraint 
Xi £ Ci can be enforced through an indicator function lc^x) and incorporated in /). We assume that fi is 
/i-strongly convex where p > 0 . 

A new 3-block ADMM algorithm is obtained by applying Algorithm 1 to the dual formulation of (2.11) and 
rewriting the resulting algorithm using the original functions in (2.11). Let f* denote the convex conjugate 
of a function /, and let 

di(w) := fi (L\w), d 2 (w) := f^L^w), d 3 (w) := f^L^w) - (w,b). 

The dual problem of (2.11) is 

minimize d\(w) + d 2 (w) + d 3 (w). ( 2 . 12 ) 

W 

Since /1 is /x-strongly convex, d\ is (||Li|| 2 //r)-Lipschitz continuous and, hence, the problem (2.12) is a special 
case of (2.4). We can adapt Algorithm 5 to (2.12) to get: 

Algorithm 7 (for problem (2.12)) Set an arbitrary z° and stepsize 7 £ (0, 2/Lt/||Li|| 2 ). For k = 0,1,.. 
iterate 

1 . get w k = prox 7d3 ( 2 fc ); 

2 . get z k+ 5 = 2 w k - z k - 7 Vdi(w; fe ); 

3. get z k+1 = z k + pro x ld 2 (z k+ ^) — w k . 

The following well-known proposition helps implement Algorithm 7 using the original objective functions 
instead of the dual functions di. 

Proposition 2.1 Let f be a closed proper convex function and let d(w) := f*{A*w ) — ( w,c ). 

1 . Any x' £ arg min^, f{x) + (w , Ax — c) obeys Ax' — c £ dd(w). If f is strictly convex, then Ax' — c = Vd(w). 

2. Any x" £ argmin 2 , f{x) + ^|| Ax — c + (l/ 7 )y || 2 obeys Ax" — c £ dd(prox^ d (y)) and pro x 7 d (y) = 
y — "f(Ax" — c). 

(We use “G” with “argmin” since the minimizers are not unique in general.) 

For notational simplicity, let 

s 7 (xi, x 2 , x 3 , w) := L\X\ + L 2 x 2 + L 3 X 3 - b - w. 

7 

By Proposition 2.1 and algebraic manipulation, we derive the following algorithm from Algorithm 7. 
Algorithm 8 (3-block ADMM) Set an arbitrary w° and x°, as well as stepsize 7 £ (0,2/z/||Li|| 2 ). For 
k = 0 , 1 ,..., iterate 

1 . get x k+1 = argmin^ /i(a;i) + (w k , Lixf); 
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2. getx k+1 £ argmin^ f 2 (x 2 ) + |||s( 2 u + \x 2 ,a;§)|| 2 ; 

5. getx k 3 +1 £ argmin X 3 / 3 (a; 3 ) + ^||s(a;J + 1 ,z£ + 1 ,z 3 )|| 2 ; 
get u > fc+1 = w k — 'y(Lix k+1 + L 2 X 2 +1 + i 3 X 3 +1 — 6 ). 

Note that Step 1 does not involve a quadratic penalty term, and it returns a unique solution since fi is 
strongly convex. In contrast, Steps 2 and 3 involve quadratic penalty terms and may have multiple solutions 
(though the products L 2 x k+1 and L 3 x k+1 are still unique.) 

Proposition 2.2 If the initial points of Algorithms 1 and 8 satisfy z° = w° + "f(L 3 x 3 — b), then the two 
algorithms give the same sequence {w k }k> o- 

The proposition is a well-known result based on Proposition 2.1 and algebraic manipulations; the interested 
reader is referred to [24, Proposition 11]. The convergence of Algorithm 8 is given in the following theorem. 

Theorem 2.1 Let Hi, .. ., Ha be Hilbert spaces, ft : Hi — > Ha be proper closed convex functions, i = 1, 2, 3, 
and assume that f\ is p-strongly convex. Suppose that the set S* of the saddle-point solutions (x\,x 2 , x 3 ,w) £ 
Hi x • • • x Ha to (2.11) is nonempty. Let p = ||Li|| 2 //i > 0 and pick 7 satisfying 

0 < 7 < - (2.13) 

P 

Then the sequences {w k }k>o, {L 2 x 2 }k>o, and {L 3 ;r 3 }fc>o of Algorithm 8 converge weakly to w*, L 2 x 2 , and 
L 3 x 3 , and {xi}fc>o converges strongly to xl, for some (w*, x\, x%, x 3 ) £ S*. 

Remark 2.1 Note that it is possible to replace Step 4 of Algorithm 8 with the update rule w k+1 = w k — 
a"f(Lix k+1 + L 2 ^ 2 +1 + L 3 x k+1 — b) where a £ ( 0 ,d) and 

a = (2(1 - P 7)) -1 (l - 2 p 7 + \/(l - 2 p 7) 2 + 4(1 -/cry)) > 1, 

for p = ||Li|| 2 //r. We do not pursue this generalization here due to lack of space. 

Algorithm 8 generalizes several other algorithms of the alternating direction type. 

Proposition 2.3 1. Tseng’s alternating minimization algorithm is a special case of Algorithm 8 if the 
x 3 -block vanishes. 

2. The (standard) ADMM is a special case of Algorithm 8 if the Xi-block vanishes. 

3. The augmented Lagrangian method (i.e., the method of multipliers) is a special case of Algorithm 8 if the 
xi~ and x 2 -blocks vanish. 

4 . The Uzawa (dual gradient ascent) algorithm is a special case of Algorithm 8 if the x 2 - and x 3 -blocks 
vanish. 

Recently, it was shown that the direct extension of ADMM to three blocks does not converge [14]. 
Compared to the recent work [10,15,27,29,31] on convergent 3-block extensions of ADMM, Algorithm 8 is 
the simplest and works under the weakest assumption. The first subproblem in Algorithm 8 does not involve 
L 2 or L 3 , so it is simpler than the typical ADMM subproblem. While fi needs to be strongly convex, no 
additional assumptions on f 2 ,f 3 and Li,L 2 ,L 3 are required for the extension. In comparison, [27] assume 
that /i,/ 2,/3 are strongly convex functions. The condition is relaxed to two strongly convex functions in 
[15,31] while [15] also needs Li to have full column rank. The papers [29,10] further reduce the condition to 
one strongly convex function, and [29] uses proximal terms in all the three subproblems and assumes some 
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positive definitiveness conditions, and [10] assumes full column rankness on matrices L 2 and L 3 . A variety 
of convergence rates are established in these papers. It is worth noting that the conditions assumed by the 
other ADMM extensions, beyond the strong convexity of / 1 , are not sufficient for linear convergence, so in 
theory they do not necessarily convergence faster. In fact, some of the papers use additional conditions in 
order to prove linear convergence. 

2.3.1 An m-block ADMM with (m — 2) strongly convex objective functions 

There is a great benefit for not having a quadratic penalty term in Step 1 of Algorithm 8 . When fi(x\) is 
separable, Step 1 decomposes to independent sub-steps. Consider the extended monotropic program 

minimize fi^xf) + f 2 (x 2 ) H-h f m (x m ) (2.14a) 

subject to L\X\ + L 2 x 2 + • ■ ■ + L m x m = b 1 (2.14b) 

where / 1 ,..., f m -2 are strongly convex and f m - 1 , fm are convex (but not necessarily strongly convex.) Prob¬ 
lem (2.14) is a special case of problem (2.11) if we group the first m — 2 blocks. Specifically, we let fi(xi) := 

fi(xi) H- + fm- 2 (x m - 2 ), fi{x 2 ) := f^- i(z m _i), f 3 {x 3 ) := / m (4), and define x 1 ,x 2 ,x 3 ,L 1 ,L 2 ,L 3 in 

obvious ways. Define s 7 (xi, x 2 , x 3 , w ) := L\X\ + L 2 x 2 + • • • + L m x m — b— ^ w. Then, it is straightforward to 
adapt Algorithm 8 for problem (2.14) as: 

Algorithm 9 (m-block ADMM) Set an arbitrary w° and xand stepsize 7 £ (0, min{2||Li||//q | i = 
1, • • • , m — 2}). For k = 0,1,..., iterate 

1 . get x k+1 = argmin $ . fi(xi) + (w k , Lixf) for i = 1 , 2 ,..., m — 2, in parallel; 

2 . get x k „+\ £ argmin 2m _ i / m _i(S m _i) + |||s(^ +1 ,.. .,x^ 2 ,x m - 1 ,^)|| 2 ; 

3. get x k + x £ argnhn 2m f m (x m ) + |||s(^ +1 ,..., x^\, x m )\\ 2 ; 

4 - get w k+1 =w k - 7 (Ai ^ +1 + L 2 x k+1 + ■ ■ ■ + Z m a 4 +1 - b). 

All convergence properties of Algorithm 9 are identical to those of Algorithm 8 . 


2.4 Reducing the number of operators before splitting 


Problems involving multiple operators can be reduced to fewer operators by applying grouping and lifting 
techniques. They allow Algorithm 1 and existing splitting schemes to handle four or more operators. 

In general, two or more Lipschitz-differentiable functions (or cocoercive operators) can be grouped into 
one function (or one cocoercive operator, respectively). On the other hand, grouping nonsmooth functions 
with simple proximal maps (or monotone operators with simple resolvent maps) may lead to a much more 
difficult proximal map (or resolvent map, respectively). One resolution is lifting: to introduce dual and dummy 
variables and create fewer but “larger” operators. It comes with the cost that the introduced variables increase 
the problem size and may slow down convergence. 

For example, we can reformulate Problem (1.1) in the form (which abuses the block matrix notation): 


0 £ 


B I 


X 


Cx 


X 


X 

-I A ” 1 


y. 

+ 

0 

=: A 

y. 

+ c 

y_ 


(2.15) 
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Here we have introduced y £ Ax, which is equivalent to x £ A~ x y or the second row of (2.15). Both the 
operators A and C are monotone, and the operator C is cocoercive since C is so. Therefore, the problem (1.1) 
has been reduced to a monotone inclusion involving two “larger” operators. Under a special metric, applying 
the FBS iteration in [20] gives the following algorithm: 

Algorithm 10 ([20]) Set an arbitrary x°, y°. Set stepsize parameters t, a. For k = 1, ..., iterate: 

1. get x k = J T B{x k ~ 1 — rCx k ~ 1 — rj/ fc_1 ); 

2 . get y k = J < 7 A~ 1 {y k ^ 1 + a( 2 x k — x k ~ x )) //comment: J&A - 1 = I ~ ° (u -1 /). 

The lifting technique can be applied to the monotone inclusion problems with four or more operators together 
with Algorithm 1. Since Algorithm 1 handles three operators, it generally requires less lifting than previous 
algorithms. We re-iterate that FBS is a special case of our splitting, so Algorithm 10 is a special case of 
Algorithm 1 applied to (2.15) with a vanished B. 

Because both Algorithms 1 and 10 solve the problem (1.1), it is interesting to compare them. Note that 
one cannot obtain one algorithm from the other through algebraic manipulation. Both algorithms apply Ja , 
J B , and C once every iteration. We managed to rewrite Algorithm 1 in the following equivalent form (see 
Appendix B for a derivation) that is most similar to Algorithm 10 for the purpose of comparison: 
Algorithm 11 (Algorithm 1 in an equivalent form) Set an arbitrary x° andy 0 . Fork = 1, ..., iterate: 

1 . get x k = J 7 b (x k ~ 1 — 7 Cx k_1 — "/y k ~ 1 ); 

2. get y k = Ji A -i [y k ~ k + -^(2x k — x k ~ k ) + {Cx k — Cx^ 1 )^ //comment: J a a - 1 = I — <jJ a -iA ° ( < 7 _ 1 -0- 

The difference between Algorithms 10 and 11 is the extra correction factor Cx k — Cx k ~ x . Without the 
correction factor, we cannot eliminate y k and express Algorithms 10 in the form of (1.4). 


3 Convergence theory 

In this section, we show that Problem (1.1) can be solved by iterating the operator T defined in Equation (1.3): 

T = In ~ Jjb + J'yA ° (2 J 1 b — In — lC o J lB )- 

Figure 3.1 depicts the process of applying T to a point z £T~L. Lemma 3.1 defines the points in Figure 3.1 


Lemma 3.1 Let z £ LL and define points: 

x k B := J lB (z k ), z'~2x k B -z k , x k A -.= J lA (z"), 

u k B ■='y~ 1 (z k - x%) £ Bx%, z" ■-z’ -~iCx k B , u k A :=^ 1 {z"-x k A ) £ Ax\. 

Then the following identities hold: 

Tz k - z k = x k A - x k B = - 7 (u| + u k A + Cx%) and Tz k = x\ + 'yu k B . 

When B = dg, we let X7g(x k ) := £ dg(x k ). Likewise when A = df, we let V f(x k ) := u k A £ df(x’j). 

Proof Observe that Tz k = z k + x\ — Xg by the definition of T (Equation (1.3)). In addition, Tz k = 
x k A + z k — x k B = x k A +"fu k B . Finally, we have x A — x k B = 2x B — z k -^u k A — ^Cx k B — x k B = — 7 (u A + u% +Cx%). 

□ 
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X A '■= J -yA{2x% - Z k - 7 Cx%) 


Fig. 3.1: The mapping T : z k i-4 z k+1 := Tz k . The vectors u \j G Bxg and u\ G Ax\ are defined 
Lemma 3.1. 


The following proposition computes a fixed point identity for the operator T. It shows that we can recover 
a zero of A + B + C from any fixed point z* of T by computing J 1 bz* ■ 

Lemma 3.2 (Fixed-point encoding) The following set equality holds 

zer(A + B + C) = J 7 s(FixT). 


In addition, 


FixT = {a: + yit | 0 € [A + B + C)x, u £ (Bx) D {—Ax — Cx)}. 

The proof can be found in Appendix C. The next lemma will help us establish the averaged coefficient of 
the operator T in the next proposition. Note that in the lemma, if we let W := 0, U := Iu — J 7 _b, and 
T\ := J-yAi the operator S reduces to the DRS operator I-u — J^b + J-yA ° (2J 7 b — In): which is known to 
be 1 / 2 -averaged. 

Lemma 3.3 Let S := U + T\ o V, where U, Tf : TL —> TL are both firmly nonexpansive and V : TL —¥ TL. Let 
W = I — {2U + V). Then we have for all z,w € TL: 

||Sz - Sw \\ 2 < \\z - w || 2 - \\{I H - S)z - {In ~ S')u ;|| 2 - 2<Ti oVz — Tyo Vw, Wz - Ww). (3.1) 

The proof can be found in Appendix C. 

The following proposition will show that the operator T is averaged. This proposition is crucial for proving 
the convergence of Algorithm 1 . 

Proposition 3.1 (Averageness of T) Suppose that Tf, T^ : TL —F TL are firmly nonexpansive and C is 
/3-cocoercive, (3 > 0. Let 7 £ (0,2/3). Then 

T := I - T 2 + Ti o (2T 2 - I n - 7 C o T 2 ) 
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is a-averciged with coefficient a := < 1. In particular, the following inequality holds for all z,w £77 

II Tz - Twf < \\z - H| 2 - ^—^11 {In ~ T)z - (In - T)wf. (3.2) 

a 

Proof To apply Lemma 3.3, we let U := In — T 2 , V := 2 T 2 —In ~ 7 C ° T 2 , and W := 7 C o T 2 . Note that U 
is firmly nonexpansive (because T 2 is), and we have W = In — (217 + V). Let S := T = In — T 2 + T\ o V. 
We evaluate the inner product in (3.1) as follows: 

- 2(Ti oVz-T l0 Vw, Wz - Ww) 

= 2 ((In - T)z - (I H ~ T)w , 7 C o T 2 z - 7 C o T 2 w) - 2 (T 2 z - T 2 w, 7 C o T 2 z - 7 C o T 2 w) 

'V 2 

< e||(/« - T)z - (I H - T)w \\ 2 +-^||CoT 2 z-Co T 2 w|| 2 - 2 7 j 9||C oT 2 z~Co T 2 w || 2 

£ 

= e|| (In ~ T)z - (/„ - 7>|| 2 - 7 (2/3 - 7 /e)||C' oT 2 z-Co T 2 w \\ 2 
where the inequality follows from Young’s inequality with any e > 0 and that C is /3-cocoercive. We set 

e := 7 /2/3 < 1 

so that the coefficient 7 (2/3 — 7 /e) = 0. Now applying Lemma 3.3 and using S = T, we obtain 

II Tz - Tw \\ 2 < \\z - «,f - (1 - e)\\(In - T)z - (I H ~ 7>|| 2 , 

which is identical to (3.2) under our definition of a. □ 

Remark 3.1 It is easy to slightly strengthen the inequality (3.2) as follows: For any e £ (0,1) and 7 £ (0,2/3e), 
let a := 1/(2 — e) < 1. Then the following holds for all z,w £ 77: 

II Tz - Twf < \\z - «,|| 2 - ^-^|| (J W - T)z - (In - T)wf 

a 

- 7 ( 2 0-J) ||CoT 2 ( 2 )-C'oT 2 ( W ;)|| 2 . (3.3) 

Remark 3.2 When C = 0, the mapping in Equation (1.5) reduces to S = refl 7J 4 o refl 7 s, which is nonexpan¬ 
sive because it is the composition of nonexpansive maps. Thus, T = (1/2)In + (1/2)5 is firmly nonexpansive 
by definition. However, when C / 0, the mapping S in (1.5) is no longer nonexpansive. The mapping 
2 T 2 — In — 7 C, which is a part of S , can be expansive. Indeed, consider the following example: Let 77 = R 2 , 
let B = 9/{(xi.o)|xieR} be the normal cone of the x± axis, and let C = V((l/2)||a:i+a: 2 || 2 ) = (xi+x 2 , x\+x 2 ). 
In particular, T 2 (xi,a; 2 ) = J 1 b(xi,x 2 ) = (:ri,0) for all (x±,x 2 ) £ R 2 and 7 > 0. Then the point 0 is a fixed 
point of R = 2 T 2 — In — 7 C o T 2 , and 

R( 1,1) = (1, -1) - 7^(1, 0 ) = (1 — 7 , -1 - 7 ). 

Therefore, ||7?(1,1) - 77(0,0)|| = sj2 + 2 7 2 > y/2 = ||(1,1) - (0,0)|| for all 7 > 0. 

Remark 3.3 When B = 0, the averaged parameter a = 2/3/(4/3 — 7 ) in Proposition 3.1 reduces to the best 
(i.e., smallest) known averaged coefficient for the forward-backward splitting algorithm [19, Proposition 2.4]. 

We are now ready to prove convergence of Algorithm 1. 
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Theorem 3.1 (Main convergence theorem) Suppose that FixT 7 ^ 0. Set a stepsize 7 £ (0, 2/3e), where 
£ £ (0,1). Set Q (0,1/a) as a sequence of relaxation parameters, where a = 1/(2 — e) < 2/3/(4/3 — 7 ), 

such that for Tk '■= (1 — Afc/a)Afc/a we have = 00 • Pick any start point z° £ TL. Let ( z 3 )j>o be 

generated by Algorithm 1, i.e., the following iteration: for all k > 0, 

z k+1 = z k + X k (Tz k - z k ). 


Then the following hold 

1. Let z* £ FixT. Then ( \\z 3 — z*||)j>o is monotonically decreasing. 

3. The sequence (||T^ J — z 3 1|)j>o is monotonically decreasing and converges to 0. 

3. The sequence ( z 3 )j>o weakly converges to a fixed point of T. 

4- Let x* £ zer (A + B + C). Suppose that inf/>o A j > 0. Then the following sum is finite: 

OO 1 

X>»C4 - Car’ll 3 < ^ _ y/e) ||*° - a’|| 3 

In particular, (Cx J B )j >0 converges strongly to Cx*. 

5. Suppose that inf,'>o A j > 0 and let z* be the weak sequential limit of(z °)j> o- Then the sequence ( J 7 B(z°))j>o 
weakly converges to J 7 s(^*) £ zer (A + B + C). 

6 . Suppose that infj>o Xj > 0 and let z* be the weak sequential limit of ( z 3 )j> o- Then the sequence ( J 1 a 0 

(2 J 1 b ~ In ~~ 7 C 0 0 weakly converges to J 1 b{z*) £ zer(a4 + B + C). 

7. Suppose that r := inf,,>o Tj >0. For all k > 0, the following convergence rates hold: 

for any point z* £ Fix(T). 

8 . Let z* be the weak sequential limit of (z J )j> o- The sequences ( J~{B(z 3 ))j>o and (J^a 0 (2 J 1 b — In ~ 7 C 0 
J'yB){z J ))j >0 converge strongly to a point in zer(a4 + B + C) whenever any of the following holds: 

(a) A is uniformly monotone 3 on every nonempty bounded subset of dom(A); 

(b) B is uniformly monotone on every nonempty bounded subset ofdom(B); 

(c) C is demiregular at every point x £ zer (A + B + C ). 4 

Proof Part 1: Fix k > 0. Observe that 

| \z k+1 - 2*|| 2 = ||(1 - A k ){z k - z*) + A k (Tz k - **)|| 2 

= (1 - A k )\\z k - **|| 2 + Afc||T 2 fe - **|| 2 - Afc(l - X k )\\Tz k - 2 fc || 2 (3.4) 

by Corollary [3, Corollary 2.14]. In addition, from Equation (3.3), we have 

\\Tz k - 2*|| 2 < || z k - 2*|| 2 - Tz k - z k || 2 - 7 (20 - 1) ||C o T 2 {z k ) -Co T 2 (z*)|| 2 . 

3 A mapping A is uniformly monotone if there exists increasing function <j> : R+ —> [0, +oo] such that 0(0) = 0 and for any 
u E Ax and v E Ay , (x — y,u — v) > 4>{\\x — y\\). If 0 = /3(-) 2 > 0, then the mapping A is strongly monotone. If a proper function 
/ is uniformly (strongly) convex, then df is uniformly (strongly, resp.) monotone. 

4 A mapping C is demiregular at x E dom(C) if for all u E Cx and all sequences (x k ,u k ) E gra(C) with x k —*■ x and u k —»■ u, 
we have x k —>■ x. 
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Therefore, the monotonicity follows by combining the above two equations and using the simplification 

r k = A fe (l - A fc ) + 

a 

to get 

||~fc+i _ + Tk \\Tz k - 4 | 2 + j\ k (2/3 - Tj ||(74 - CJ lB {z *)\\ 2 < \\z k - 4| 2 . 

Part 2: This follows from [3, Proposition 5.15(h)]. 

Part 3: This follows from [3, Proposition 5.15(iii)]. 

Part 4: The inequality follows by summing the last inequality derived in Part 1. The convergence of 
(Cx J B )j> o follows because infj > 0 A j > 0 and the sum is finite. 

Part 5: Recall the notation from Lemma 3.1: set 4 = Jj B (z k ), x k A = J 1 a {24 — z k — y(74), 4 = 
{lh)(z k - 4) G Bx k B , and u k A = (l/ 7 )(2x| - z k - jCx% - x k A ) £ Ax\. 

Since \\x% - J lB {z*) || = || J 7 s( 2 fe ) - J lB {z*) || < \\z k - z*\\ < ||z° - z*||, Vfc > 0, the sequence (4)l>o is 

k ■ 

bounded and has a weak sequential cluster point x. Let x B — 1 x as j —> oo for index subsequence (kj)j>o- 

Let x* £ zer(A + B + C). Because C is maximal monotone, Cx k B —> Cx *, and x B —*■ x, it follows by the 

k ■ 

weak-to-strong sequential closedness of C that Cx = Cx* [3, Proposition 20.33(h)] and thus Cx B —> Cx. 
Because x A — x B = Tz k — z k —S► 0 as k > oo by Part 2 and Lemma 3.1, it follows that 

x k B —- x, x A — 1 x, Cx B —> Cx, u B — i — (z* — x), and u A — {x — z* — yCx) 

7 7 

as j —> oo. 

Thus, [3, Proposition 25.5] applied to (x A ,u A ) £ gra A, (x B ,u B ) £ B, and (. x B ,Cx B ) £ C shows that 
x £ zei(A + B + C), z* — x £ 7 Bx, and x — z* —'yCx £ 7 Ax . Hence, as x = J lB (z*) is unique, x is the unique 
weak sequential cluster point of ( x B )j>o ■ Therefore, {x B )j>o converges weakly to J lB (z*) by [3, Lemma 
2.38], 

Part 6 : Assume the notation of Part 5. We shall show x A —*• J lB {z*). This follows because x A — x B = 
Tz k — z k —> 0 as k —» 00 and x B —*• J lB (z*). 

Part 7: The result follows from [24, Theorem 1], 

Part 8 : Assume the notation of Part 5 and let x* = J lB {z*),u* B = (l/ 7 )(z* — x*) £ Bx*, and u* A = 
( 1 / 7 ) (x* — z*) — Cx* . Now we move to the subcases. 

Part 8a: Because B + C is monotone and (x B ,u B ) £ B, we have (4 ~x*,u B + Cx B — (u B + Cx B )) > 0 
for all k > 0. Consider the bounded set S = {x*} U {x 3 A \ j > 0}. Then there exists an increasing function 
(f>A : R+ —> [0, 00 ] that vanishes only at 0 such that 

1 ^a{\\xa - x*\\) < 7(4 “44 -4) +7 (x k B - x* ,u k B + Cx k B - {u* B + Cx* B )) 

= 7(4 - 4> 4 - 4) + 7(4 -x*,u k A - u * A ) + 7(4 -x*,u k B + <74 - (4 + Cx* B )) 

= i(x k A - 4) 4 - 4) + 7(4 - x*, 4 + 4 + cx%) 

= (4 - x k A , 4 - 7 4 - (x* - 74)) 

= (4 — 4> — z *) + 7(4 — x A , Cx B — Cx*) —^ 0 as k —> 00 

where the convergence to 0 follows because x% — x k A = z k — Tz k —► 0, z k — 1 z*, and Cx B —> Cx* as k —> 00 . 
Furthermore, x B —► x* because x A — x B —> 0 as k —> 00 . 
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Part 8 b: Because A is monotone, we have (x A — x*,u A — u A ) > 0 for all k > 0. In addition, note that 
B + C is also uniformly monotone on all bounded sets. Consider the bounded set S = {a;*} U {x B j > 0}. 
Then there exists an increasing function <f>B : R+ —> [0, oo] that vanishes only at 0 such that 

<I>b(\\xb ~ ®*||) < 7 (x A - x*,u k A - u* A ) + j{x% - x*,u B + Cx B - (u* B + Cx* B )) -4 0 as k -s> oo 

by the argument in Part 8 a. Therefore, ig — > x* strongly. 

Part 8 c: Note that Cx% —► Cx* and x k B —^x*. Therefore, x B —> x* by the demiregularity of C. □ 

Remark 3-4 Theorem 3.1 can easily be extended to the summable error scenario, where for all k > 0, we 
have 

z k+1 = z k + X k (Tz k - z k + e k ) 

for a sequence {ej)j> o Q H of errors that satisfy Si^o^ fc ll e ill < 00 ( e -§-; using [19; Proposition 3.4]). The 
result is straightforward and will only serve to complicate notation, so we omit this extension. 

Remark 3.5 Note that the convergence rates for the fixed-point residual in Part 7 of Theorem 3.1 are sharp— 
even in the case of the variational Problem (1.2) with h = 0 [24, Theorem 8 ]. 


4 Convergence rates 

In this section, we discuss the convergence rates Algorithm 1 under several different assumptions on the 
regularity of the problem. Section 1.2 contains a brief overview of all the convergence rates presented in this 
section. For readability, we now summarize all of the convergence results of this section, briefly indicate the 
proof structure, and place the formal proofs in the Appendix. 


4.1 General rates 


We establish our most general convergence rates for the following quantities: If z* is a fixed point of T, 
x* = J 1 b{z*) 1 and x € H, then let 

K k {\x) = ||^ - xf - ||z fe+1 - z|| 2 + (l - |) Ik fc - ^ +1 || 2 + 2j(z k - z k+1 ,Cx k B ), 

i4(\,x*) = \\z k - z*\\ 2 - \\z k+1 - z*\\ 2 + (l - ^ \\z k - z k+1 \\ 2 + 2j(z k - z k+1 ,Cx k B - Cx*). 

In Theorems D.l, D.2, and D.3 we deduce the following convergence rates: For j £ {1, 2}, X\ = x, x^ = x* 
and for all k > 0, we have 


Nonergodic (Algorithm 1): 
Ergodic (Algorithm 1 & (1.6)): 


K k (l ,Xj) = o 


/c 


Ti=0 ^ k i —o 
k 


1 + [kill 

Vk + 1 

i + IM 2 

k + 1 


(fc + l)(fc 




Ergodic (Algorithm 1 & (1.7)): 
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It may be hard to see how these terms relate to the convergence of Algorithm 1. The key observation of Propo¬ 
sition D.2 shows that n k is an upper bound for a certain variational inequality associated to Problem (1.1) 
and that n k bounds the distance of the current iterate x k (or its averaged variant x k in Equations (1.6) 
and (1.7)) to the solution whenever one of the operators is strongly monotone. 

The proofs of these convergence rates are straightforward, though technical. The nonergodic rates follow 
from an application of Part 7 of Theorem 3.1, which shows that || 2r fe_l_1 — z k || 2 = o(l/(fc + 1)). The ergodic 
convergence rates follow from the alternating series properties of K k together with the summability of the 
gradient shown in Part 4 of Theorem 3.1. 


4.2 Objective error and variational inequalities 

In this section, we use the convergence rates of the upper and lower bounds derived in Theorems D.l, D.2, 
and D.3 to deduce convergence rates of function values and variational inequalities. All of the convergence 
rates have the following orders: 


Nonergodic: o 


1 

Vk + 1 


and 


Ergodic: O 



The convergence rates in this section generalize some of the known convergence rates provided in [24, 
22,23] for Douglas-Rachford splitting, forward-Douglas-Rachford splitting, and the primal-dual forward- 
backward splitting, Douglas-Rachford splitting, and the proximal-point algorithms. 


4-.2.1 Nonergodic rates (Algorithm 1 ) 

Suppose that A = df + A, B = dg + B and C = X/h + C where /, g and h are functions and A , B and C are 
monotone operators. Whenever / and A are Lipschitz continuous, the following convergence rate holds: 

fi x B ) + s(*b) + M*b) ~ (/ + 9 + h)(x) + (x k B — x, Ax k B + u ^ + Cx B ) = o ^ • (4-1) 

A more general rate holds when / and A are not necessarily Lipschitz. See Corollaries D.3 and D.4 for the 
exact convergence statements. 

Note that quantity on the left hand side of Equation (4.1) can be negative. The point x% is a solution to 
the variational inequality problem if, and only if, the Equation (4.1) is negative for all x £ "H, which is why 
we include the dependence on ||x||. 

Notice that when the operators A,B and C vanish and x = x*, the convergence rate in (4.1) reduces to 
the objective error of the function f + g + h at the point x k B 


/(^b) + 9 ( x b ) + h{x k B ) — (/ + g + h)(x*) — o 


1 + 11*11 \ 
Vk + i ) ' 


(4.2) 


and we deduce the rate o(l/y/k + 1) for our method. By [24, Theorem 11], this rate is sharp. 
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Further nonergodic rates can be deduced whenever any A, B , or C are jiA, Mb and /zc-strongly monotone 
respectively. In particular, the following two rates hold for all k > 0 Corollary D.9: 


Ma||4 - *1 2 + (Mb + Mc)ll4 - z *|| 2 = o 


min {ha\\x 1 a ~ a-’*l | 2 + (mb + Me) \\x l B - ^*|| 2 } = o 

4 = 0 ,••• ,k 


l 


Vk+l) ’ 
1 


/c + 1 


Ergodic Rates 

We use the same set up as Section 4.2, except we assume that A and B are skew linear mappings (i.e., 
A* = —A and B* = — B) and C = 0. If (x B )j >o is generated as in Equation (1.6) or Equation (1.7) and / 
is Lipscliitz continuous, the following convergence rate holds: 

/(4) + 9( x b) + h(x k B ) — (/ + g + h){x) + {x% — x , Ax% + Bx k B ) = o ^ fc-f / ) ’ (4-3) 

A more general rate holds when / is not necessarily Lipschitz. See Corollaries D.5-D.8 for the exact conver¬ 
gence statements. 

Further nonergodic rates can be deduced whenever any A, B , or C are /j,a, Mb and /rc-strongly monotone 
respectively. In particular, the following two rates hold for all k > 0 Corollary D.9: Let {x J A )j>o and (x B )j >o 
be generated by Algorithm 1 and Equations (1.6) or (1.7). Then 

Va\\xa - x*\\ 2 + {ns + Hc)\\x k B - z*|| 2 = O (■ 


4.3 Improving the objective error with Lipschitz differentiability 

The worst case convergence rate o(l/y/k + 1) for objective error discussed in proved in Corollary D.3 is 
quite slow. Although averaging can improve the rate of convergence, this technique does not necessarily 
translate into better practical performance as discussed in Section 1.3.1. We can deduce a better rate of 
convergence for the nonergodic iterate, whenever one of the functions / or g has a Lipschitz continuous 
derivative. In particular, if V/ exists and is Lipschitz, we show in Proposition D.3 that the objective error 
sequence ((/ + g + h)(x J B ) — (f + g + h)(x*))j> o is summable. From this, we immediately deduce Theorem D.5 
the following rate: for all k > 0 , we have 


.min k {{f + g + h)(x l B )-(f + g + h)(x*)} = o • 

A similar result holds for the objective error sequence ((/ + g + h)(x J A ) — (/ + g + h)(x*))j> o when the 
function g is Lipschitz differentiable. Thus, when / or g is sufficiently regular, the convergence rate of the 
nonergodic iterate is actually faster than the convergence rate for the ergodic iterate, which motivates its 
use in practice. 
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4.4 Linear convergence 

Whenever A, B and C are sufficiently regular, we can show that the operator T is strictly contractive towards 
the fixed point set. In particular, Algorithm 1 converges linearly whenever 

(ma + Mb + Mc)( 1/ La + 1 /Lb) > 0 

where La and Lb are the Lipschitz constants of A and B respectively and A,B, or C are ma, Mb and 
Mc-strongly monotone respectively (where we allow the La = Lb = ma = Mb = Me = 0). 

Note that this linear convergence result is the best we can expect in some sense. Indeed, even if nc and 
Ma are strongly monotone, Algorithm 1 will not necessarily converge linearly. Section D.6 we provide an 
example such that 


MaMC > 0, but || z k — 2 *|| converges arbitrarily slowly to 0. 


4.5 Convergence rates for multi-block ADMM 

All of the results in this section imply convergence rates for Algorithm 8, which is applied to the dual 
objective in Problem (2.12). Using the techniques of [24, Section 8] and [25, Section 6], we can easily derive 
convergence rates of the primal objective in Problem (2.11). We do not pursue these results in this paper 
due to lack of space. 


5 Numerical results 

In this section, we present some numerical examples of Algorithm 1. We emphasize that to keep our imple¬ 
mentations simple, we did not attempt to optimize the codes or their parameters for best performance. We 
also did not attempt to seriously evaluate the prediction ability of the models we tested, which is beyond the 
scope of this paper. Our Matlab codes will be released online on the authors’ websites. All tests were run on 
a PC with 32GB memory and an Intel i5-3570 CPU with Ubuntu 12.04 and Matlab R2011b installed. 


5.1 Image inpainting with texture completion 

This section presents the results of applying Problem (2.8) to the color images 5 of a building, parts of which 
are manually occluded with white colors. See Figure 5.1. The images have a 517 x 493 resolution and three 
color channels. At each iteration of Algorithm 1, the SVDs of two matrices of sizes 517 x 1479 and 1551 x 493 
consume most of the computing time. However, it took less 150 iterations to return good recoveries. 


5 We are grateful of Professor Ji Liu for sharing his data in [33] with us. 
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(a) Original image 


(b) Occluded image 1 



(d) Recovered image 1 (e) Recovered image 2 

Fig. 5.1 : Images recovered by solving the tensor completion Problem (2.8) using Algorithm 1 for two 
different types of occlusions. 


5.2 Matrix completion for movie recommendations 

In this section, we apply Problem (2.9) to a movie recommendation dataset. In this example, each row of 
X 0 £ R mxn corresponds to a user and each column corresponds to a movie, and for all* = 1, • • • ,m and 
j = 1, • • • , m, the matrix entry (Xo)ij is the ranking that user i gave to movie j. 

We use the MovieLens-lM [1] dataset for evaluation. This dataset consists of 1000209 observations of the 
matrix X 0 £ r6040x3952. yy e pi 0 £ Qur numer i ca i results in Figure 5.2. In our code we set l = 0, u = 5 and 
solved the problem with different choices of ^ in order to achieve solutions of desired rank. In Figure 5.2c 
we plot the root mean-square error 


\\A(X^X 0 )\\ f 
V1000209 ’ 


which does not decrease to zero, but represents how closely the current iterate fits the observed data. 

The code runs fairly quick for the scale of the data. The main bottleneck in this algorithm is evaluating 
the proximal operator of || • ||*, which requires computing the SVD of a 6040 x 3952 size matrix. 
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(a) Fixed-point residual at iteration k. 


(b) Rank at iteration k. 



(c) Root mean square error (Equation (5.1)) at iteration k. 

Fig. 5.2 : Run time and convergence rate statistics for the matrix completion Problem (2.9) on the 
MovieLens-lM database [1], 


5.3 Support vector machine classification 

In support vector machine classification we have a kernel matrix K £ R dxd generated from a training set 
X = {ti, ■ ■ ■ , td} using a kernel function /C : X x X —> R: for all i,j = l,---,d,we have Ki j = /C(U, tj). 
In our particular example, X C R n for some n > 0 and X a : R" x R" — > R++ is the Gaussian kernel given 
by K, a {t,t') = e~ c ’^ t ~ t H for some a > 0. We are also given a label vector y G {—1, l} d , which indicates the 
label given to each point in A'. Finally, we are given a real number C > 0 that controls how much we let our 
final classifier stray from perfect classification on the training set X. 

We define constraint sets C\ = {0 < x < C} and C 2 = {x € R d | ( y,x) = 0}. We also define Q 0 = 
diag(z/)A'diag(y). Then the solution to Problem (2.10) with Q = Qq is precisely the dual form soft-margin 
SVM classifier [21]. Unfortunately, the Lipschitz constant of Q o is often quite large (i.e., 7 must be small), 
which results in poor practical performance. Thus, to improve practical we solve Problem (2.10) with Q = 
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c 


kernel parameter a 



2 —5 

2 —3 

2 1 

2 

1 

0.82689 

0.83636 

0.82782 

0.7755 

2 2 

0.82658 

0.82441 

0.81742 

0.7755 

2 4 

0.83465 

0.81835 

0.8168 

0.7755 

2® 

0.83465 

0.81835 

0.80795 

0.7755 


Table 5.1: Classification accuracy for different choices of C and a in the SVM model. 


Pc 2 Q 0 Pc 2 , which is equivalent to the original problem because the minimizer must lie in C 2 . The result is a 
much smaller Lipschitz constant for Q and better practical performance. This trick was first reported in [23, 
Section 1.6]. 

We evaluated our algorithm on a subset X atl of the UCI “Adult” machine learning dataset which is entitled 
“a7a” and is available from the LIBSVM website [13]. Our training set X tra in consisted of a d = 9660 element 
subsample of this 16100 element training set (i.e., a 60% sample). Note that Q has d 2 = 9660 2 = 93315600 
nonzero entries. In table 5.1, we trained the SVM model (2.10) with different choices of parameters C 
and a, and then evaluated their prediction accuracy on the remaining 16100 — 9660 = 6440 elements in 
Xtest = A a n\X tra in- We found that the parameters a = 2 -3 and C = 1 gave the best performance on the 
test set, so we set these to be the parameters for our numerical experiments. 

Figure 5.3 plots the results of our test. Figures 5.3a and 5.3b compare the line search method in Al¬ 
gorithm 3 with the basic Algorithm 1. We see that the line search method performs better than the basic 
algorithm in terms of number of iterations and total CPU time needed to reach a desired accuracy. Because 
of the linearity of the projection Pc 2 , we can find a closed form solution for the line search weight p in 
Algorithm 5.3a as the root of a third degree polynomial. Thus, although Algorithm 3 requires more work per 
iteration than Algorithm 1, it still takes less time overall because Algorithm 1 must compute /3 = 1/||<5||, 
which is quite costly. 

Finally, in Figure 5.3c we compare the performance of the nonergodic iterate generated by Algorithm 1, 
the standard ergodic iterate (1.6), and the newly introduced ergodic iterate (1.7). We see that the nonergodic 
iterate performs better than the other two, and as expected, the the new ergodic iterate outperforms the 
standard ergodic iterate. We emphasize that computing these iterates is essentially costless for the user and 
only modifies the final output of the algorithm, not the trajectory. 

We emphasize that all steps in this algorithm can be computed in closed form, so implementation is easy 
and each iteration is quite cheap. 


5.4 Portfolio optimization 

In this section, we evaluate our algorithm on the portfolio optimization problem. In this problem, we have 
a choice to invest in d > 0 assets and our goal is to choose how to distribute our resources among all the 
assets so that we minimize investment risk, and guarantee that our expected return on the investments is 
greater than r > 0. Mathematically, we model the distribution of our assets with a vector x £ R d where x t 
represents the percentage of our resources that we invest in asset i. For this reason, we define our constraint 
set Ci = {x £ R d | J2i=i x i = > 0} to be the standard simplex. We also assume that we are given 
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(a) Fixed-point residual with and without line search (LS). (b) Objective value with and without line search (LS). 



(c) Comparison of ergodic and nonergodic iterates. 

Fig. 5.3: Run time and convergence rate statistics for the SVM Problem (2.10) on the UCI “Adult” Machine 
learning dataset [30]. Results are with the parameter choice that has the best generalization to the test set 

(O) = (1, 0.2- 3 ). 


a vector of mean returns to £ R ci where m* represents the expected return from asset i. and we define 
Ci = {x £ R d | (m,x) > r}. Typically, we model the risk with a matrix Qq £ R dxd , which is usually chosen 
as the covariance matrix of asset returns. However, we stray from the typical model by setting Q = Qo + /z/pd 
for some /z > 0, which has the effect of encouraging diversity of investments among the assets. In order to 
choose our optimal investment strategy, we solve Problem (2.10) with Q, C\ and C 2 introduced here. 

In our numerical experiments, we solve a d = 1000 dimensional portfolio optimization problem with a 
randomly generated covariance matrix Q o (using the Matlab “gallery” function) and mean return vector to. 
We report our results in Figure 5.4. In order to get an estimate of the solution of Problem (2.10), we first 
solved this problem to high-accuracy using an interior point solver. 

The matrix Q in this example is positive definite for any choice of /z > 0, but the condition number of 
Q o is around 8000, while the condition number of Q with /.i = .1 is around 5. For this reason, we see a huge 
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(a) Distance to solution using accelerated and non acceler¬ 
ated methods. 

Fig. 5.4: Convergence rate statistics for the 


(b) Objective value of accelerated and non accelerated meth 
ods. The blue curve is covered by the blue curve. 

portfolio optimization problem in Section 5.4. 


improvement in Figure 5.4a with the acceleration in Algorithm 2, while in the case /i = 0 in Figure 5.4b, the 
accelerated and non accelerated versions are nearly identical. 

We emphasize that all steps in this algorithm can be computed in nearly closed form, so implementation 
is easy and each iteration is quite cheap. 


6 Conclusion 

In this paper, we introduced a new operator-splitting algorithm for the three-operator monotone inclusion 
problem, which has a large variety of applications. We showed how to accelerate the algorithm whenever 
one of the involved operators is strongly monotone, and we also introduced a line search procedure and 
two averaging strategies that can improve the convergence rate. We characterized the convergence rate of 
the algorithm under various scenarios and showed that many of our rates are sharp. Finally, we introduced 
numerous applications of the algorithm and showed how it unifies many existing splitting schemes. 
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A Proof of Theorem 1.2 


We first prove a useful inequality. 


Proposition A.l Let B be [ig-strongly monotone where we allow the case pb = 0. Suppose that x® A £ Ti and set x® B 
J^ob{x° a ),u q b = (I/ 70 )(I — J 1 b){x q a ). For all k>0, let 



(A.l) 


1. Suppose that C is (3-cocoercive and pc-strongly monotone. Let 77 £ (0,1) and let ( 7 j)j>o Q (0,2(1 — Then the 

following inequality holds for all k > 0 : 



< (1 - 27fcMC^)l|z 



(A.2) 


2. Suppose that C is Lc-Lipschitz, but not necessarily strongly monotone. In addition, suppose that pb > 0. Then the 
following inequality holds for all k > 0 : 


(1 + 27 fe O s - 'y k L‘^/2))\\x k j+ 1 - x*\\ 2 + 7 l L c\\ x \3 
< \\xg — x* ||“ + 7j7cll a: B — x *ll“ + 7k\\ u B — u b\\ 


.k+1 _ * 

R 


B 



(A.3) 


Proof Fix k > 0. 

Part 1: Following Fig. 3.1 and Lemma 3.1, let 


U A — — 7 k u B — 7 kCx B ) — J'yj^Ai^B ~ 7k u B ~ 7 k^X B )) £ Au\. 


In addition, u B £ Bu b for all k > 0. The following identities from Fig. 3.1 will be useful in the proof: 
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First we bound the sum of two inner product terms. 

2 lk (<4 - x*,4 + Cx |> + (4 +1 - x‘,4+ 1 }) 

= 2 7fe (<4 - 444 + c4> + (4 +1 - 44 +1 
= 2 7fe ((4 - 444 + cx % + 4> + (4 - 4 +1 >4>) + 2(4 +1 - 44 - 4 +1 > 
= 2(4 - 4+\4 - 4> + 2<4 +1 - 44 - 4 +1 > + 2 7/=<4 - 4 +1 >4 - 4> 


4 + c4)) 


B 

„k fc +1 


+ 2 7fc (4-4 >4) 


= Uc R — a; 


fc+li|2 


— kr a — # 


„fc+i 


fc+li|2 


— hr a — rr r 


x r || || a? p x || || a? p «r 

, fc+l 


fc+li|2 


+ 2 7fc(4 — u s +1 ’ 4 — 4) + 2 7k (4 — 4 +1 > 4) 


= ||4 — x* II 2 — |jxn +1 — x*|| 2 — llx'j — a; 


_k+li|2 


- ll- Xr 


+ 74114-4 


II fc +1 

— It R — U 


fc +1 


‘ + n4 — 4 +1 ll“) + 27 k(4 +1 — 4>4) 


= \\x B — x || — ||x s — x | —\\x a — x b\\ 

+ 7ibll4 — 411“ — 7fell4 +1 — 411“ + 2 7fc(4 +1 — 4> 4)- 

Furthermore, we have the lower bound 

2 lk (<4 - x*,4 + c4) + (4 +1 - x*,4 +1 >) 

> ilk (<4 - a:*, 4 + C 4> + (4 +i “ 44)) + 2 7feMslk s 

We have the further lower bound: For all 77 E (0,1), we have 

2(4 -x*,Cx%) = 2(4-4, c4 - Cx*) + 2(x^ -x k B ,Cx*) + 2(4 -x*,Cx k B ) 


„k +1 


> -- 


4 - 4|| 2 - 2/1(1 - t?)||C 4 - Cx*|| 2 + 2/io^||*| - : 


2/3(1 - r ?) 1 

+ 2/3(1 - r/)||C4 - Cx *|| 2 + 2 (x^ - x‘, Cx*). 

Altogether, we have 

2 Ik (< x k A - x*,4 + Cx%) + (x* +1 - x*,4 +1 >) 

> 2 lk ((:4 - x*,4 + Cx*) + (4+ 1 - x*,4>) 

Il4 - 4II 2 + 2 7fcMc(l - 414 - x *" 2 

Ik 


Ik 


2(1-43 

= 2 7 i; (4' — 4> 4) — 


; + 2 7fc /i B ||x^ +1 - x*|| 


B 

k „* ||2 


2(1-43 


-4H 2 + 2 7feMC^Il4 -^*l| 2 + 27J=^Blk B +1 


Thus, combine (A.4) and (A. 7) to get 


(i + 2 7 fc/ts )n 4 +1 -x*ii 2 + 7 |h4 +1 -• 


+ i- 


7 k 


||_fc _ r k || 2 
l x A x sll 


(A.4) 


(A.5) 


(A. 6 ) 


(A.7) 


s ll t ik ll b ^ 2(1 — r/)/3, 

< (i - 2 7fc /ic4l4 - 4I 2 + 7 2 ll4 - 4II 2 - 

Part 2: This follows the exact same reasoning, except we replace Equation (A.6) with the following lower bound: 
2(4 — x * , Cx k B ) = 2(x k A -x k B ,Cx k B - Cx*) + 2(x\ - x|,Cx*) + 2(x| - x* ,Cx k B ) 

> ~ — \\x h A - 4|| 2 - 7fcl|Cx| - Cx* II 2 + 2(4 - x*, Cx*) 

Ik 

> 1|4 - 4ll 2 - 7*4114 - 4I 2 + 2(4 - +\cx*>. 

ik 
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□ 


We are now ready to prove Theorem 1.2. 

Proof (Theorem 1.2) Part 1: The definition of 7 &+i ensures that 

1 + 27 k n B _ (1 - 27 fe+ i ii c v) 


7fc 

Therefore, by (A.2), the following inequality holds for all k > 0: 


Tl+i 


(1 - lk+ivcv) II fc +1 


Tfc+i 


X ry ~ X 


II«b +1 -«bII 3 < (1 7 f c??) n4-^ll 3 - 

7 2 


Now observe that from Equation (1.8), we have 7 ^ —> 0 as k —>■ 00 . Therefore, 

7fc / 1 + 27 *./x B 
- — * /-^ 1 as k —> 00 . 

7fc+i V 1 “ 2 7fc+iMCf? 

In addition, the sequence (I/TjJjX) is increasing: 

7fe - 7fe + i = 7fe7fe+l ( 2 7fc^B + Z'Yk+lVCV) > 0. 

Thus, we apply the Stolz-Cesaro theorem to compute the following limit: 

lim (fc + l )7fc = lim = lim + 2) ~ (* + 1} = lim Wl = Um 7fc7* + i hk+i + 7fc) 

fc—>-oo k—to o fc—>oo —I— — k—too 7 fc — 7 /c+l fc—>■ oo 7^ — 


7fc 7fc + l 7fc 

7fe7fe+i (7fc+l + 7fc) 


1 + - 2 *- 1 4 . -Ik— 

' /•»/ v 7fc+l "Yfc+l 

= lim --- = lim —- = lim —- 

k-Kx 'yk'Yk+l yZlkV’B + Z'Yk+lfJ’CV) 2 * u B + 2/ic?7 fe-s-cc 2 k u. B + 2 ficV 

7fc + l 7fc + l 

i 


ncn + mb 


Thus, we have 


Il4 + 1-Z*|l| 2 <7-^-r 

(1 — Ik+lpcn) 


(1 — 70MC 7 ?) || 0 * || 2 I II 0 * || 2 

-o-II^R — 3? || + 11 U d — || 

7o 


= O 


1 


(fc + l) 2 


Part 2: The proof is nearly identical to the proof of Part 1. The difference is that the definition of 7 fc+i ensures that for all 
k > 0, we have 


1 (l + 27 fc (/r B - 7 fc L^,/ 2 )) 

7fc+i 7^ 

In addition, we have 7 & —> 0 as A: —> oo. The sequence (l/7j)j>o is also increasing because 7 ^ < 2for all k > 0. Finally 
we note that 7fc/7fc+l — > 1 as k —»■ 00 . Thus, we apply the Stolz-Cesaro theorem to compute the following limit: 


lim (k + l) 7 fe 

k —7oo 


fe + l .. (fe + 2)-(fe + l) 7fc7fc+i 

lim — 7 — = lim - z. - - - = lim - 

k—too — k—t oo —-— — — k—too 'Yu — 7fc._i_i 

Ik 7fc + l Ik ^ 

j. 7fc7fc+i(7fc+l +7fc) _ lim 7fc+l + 7 k 

k ^° 2 7l + i7 it (mb - 7 kL 2 c /2) 0 7 fe+ i(2/r s - 


= lim 
fc—v 00 


7fc7fc+l(7fc+l + 7fc) 

7 2 - 7*4! 


= lim 1+7fc/7fe 7 = ±. 
k-*0 2 11B - 7fc^c M B 


□ 
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B Derivation of Algorithm 11 

(B.la) 
(B.lb) 
(B.lc) 

These give us the further subgradient identity: 

u k A +1 =u k A + 04+ 1 + u h B +1 + Cx k B +1 ) + (Cx I - Cx k B +1 ) - (u k A + u k B +1 + Cx k B ) 

= u k A + -{x k B +1 - x k A +1 ) + (Cx% - Cx^ 1 ) + -{x k B +1 - x%) 

7 7 

= J ± A -1 ^U A + — x b) + (P x % ~ 5 

where the first equality follows from cancellation, the second from (B.l), and the third from the property: 

for any = v -^A +1 ’ (^a" 1 ’^ 1 ) £ graA -<=>• = Ji A -i(v), 

7 i 

which follows from the definition of resolvent Ji A -\. In addition, 

T 

~ X B ~ 7( W B +1 + ^ x % + w a) 

= ^7-B (®B - l Cx B ~ 7^a) » 

where the second equality follows from the property 

for any uGH, x ^ +1 = u — 7 w^ +1 , (®^ +1 , w^ +1 ) G graB a: s +1 = ^ 7 b(v)- 

Altogether, for all k > 0, we have 

( X B - l Cx B - 7«a) > 

7 A— 1 ^“a + “( 2a; s +1 — ®l) + (C x a — C*a +1 )^ • 

Algorithm 11 is obtained with the change of variable: x k «— x^ and y k «— u A . 


y.k + 1 _ 


fc+i 

,Jj A 


u A = 


Observe the following identities from Fig. 3.1 and Lemma 3.1: 


„fc+i _ 


r fc • 
C S 


r.k+1 _ 


- = 7(«| + 


fc +1 

B 

fc +1 
B 
k 
B 


= 7 


= 7(m' 


■ Cx B 


Cx% 


-«a); 
+ “a)- 


C Proofs from Section 3 

Proof (of Lemma 3.2) Let x G zer(A + B +- C), that is, 0 G (A +- 13 +- C)x. Let ua G Ax and ub G B:r be such that 
that ua + ub + Cx = 0. In addition, let z = x -\- 7 ub . We will show that 2 is a fixed point of T. Then J^b( z ) = ^ and 
2J 1 b{z) — z — ^CJ 1 b{ z ) = 2x — z — ^yCx = x — ^Cx — ^ub — x + jua- Thus, cc = J 7 j 4(£ + 711 , 4 ) = J 7 a(2J 7 b(.z) — z — 7 CJ 7 b(- 2 : ))- 
Therefore, 


Tz = T(a; + 7 w s) 

— J'yA (2 J^b (•z-) 2 iCJ^ B {z)) + (Fh J^/B ) (^) 


= £ + 7^s 


= 2 :. 
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Next, suppose that z G FixT. Then there exists u B G B(J lB (z)) and u A G A(J~ a ( 2J~ B (z) — z — 7 CJ~ )B (z))) such that 

z = Tz 

— z T <I^ A (2J^ B (z) z ^yC. J^ B (z)) 

= z- j(u A +u B + CJ-y B (z))- 

Thus, x = J jA (2J^ b (z) — z — 'yCJrys^z)) = J^ B (z) and u A + u B + Cx = 0. Therefore, x = J lB (z) G zer(A + B + C). 

The identity for FixT immediately follows from the fixed-point construction process in the first paragraph. □ 

Proof (of Lemma 3.3) Let z,w G "H. Then 

|| Sz - SHI 2 = || Uz - Uw \| 2 + ||Ti oVz-Tio Vw\\ 2 + 2(Ti oFz-Tjo Vw,Uz - Uw) 

< (Uz — Uw, z — w) + (Ti o Vz — Ti o Vw, Vz — Vw) + 2(Ti o Vz — T\ o Vw, Uz — Uw) 

= (Uz - Uw, z - w) + (Ti o Vz - Ti o Vw, (2U + V)z - (2U + V)w) 

= (Uz - Uw, z -w) + (Ti oVz - Ti o Vw, (I - W)z - (I - W)w) 

= (Sz - Sw, z - w) - (Ti oFz-Tio Vw, Wz - Ww) 

where the inequality follows from the firm nonexpansiveness of U and T\. Then, the result follows from the identity: 

(Sz - Sw, z — w) = i|| z- w\\ 2 - i|| (In - S)z - (In - S)uj|| 2 + ^||Sz - S«i|| 2 . 

□ 


D Proofs for convergence rate analysis 

We now recall a lower bound property for convex functions that are strongly convex and Lipschitz differentiable. The first bound 
is a consequence of [3, Theorem 18.15] and the second bound is a combination of [3, Theorem 18.15] and [35, Theorem 2.1.12]. 

Proposition D.l Suppose that f : Ti —> (0,oo] is p-strongly convex and (1/ (3)-Lipschitz differentiable. For all x,y G dom(/), 
let 

Sf(x,y) := max j^||V/(x) - V/(y)|| 2 , f\x - y|| 2 | 

Qf(x,y) ■= max |2S / (x,t/), f ^ ||x - y|| 2 + ^ ^ ||V/(x) - V/(y)|| 2 | 

Then for all x,y G dom(/), we have 

f{x) - f(y) -(x- y,Vf(y)) > Sf(x, y); 

<V/(x) - V/(j/),x — y) > Qf(x, y) 

Similarly, if A : Ti —»■ Ft is p-strongly monotone and /3-cocoercive, we let 

Qa(x,v ) = max {y.\\x - y\\ 2 , (3\\Ax - Ay\\ 2 } 
for all x,y G dom(A). Then for all x, y G dom(A), we have 

(Ax - Ay,x - y) > Q A (x,y). 

We follow the convention that every function / is pf > 0 strongly convex and (l/(3f) > 0 Lipschitz where we allow 
the possibility that (3f = pf = 0. With this notation, the results of Proposition D.l continue hold for all /. We follow the 
same convention for monotone operators. In particular, every monotone operator A : Ti 2^ is //^-strongly monotone and 
/3^-cocoercive where pa > 0 and 0a > 0. Finally, we follow convention that Qdf - = Qf- 

Note that we could extend our definition of Qa('i’) ( or Q/(*,•)) to th e cas ^ where A is merely strongly monotone in a 
subset of the coordinates of Ti (which is then assumed to be a product space). This extension is straightforward, though slightly 
messy. Thus, we omit this extension. 

The following identity will be applied repeatedly: 


(D.3) 

(D.4) 


(D.l) 

(D.2) 
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Proposition D.2 Let z E Li, let z * be a fixed point of T, let 7 > 0, let A > 0, and let 2 "*" = (1 — A )z + A Tz. Then 


2~f\(x B - x A, U* B + Cx*) + 27 A Qa(x a ,x*) + 27 A Q B (x B ,x*) + 27 A Q c (x B ,x*) 
< 2'yX(x A — x*,u A ) + 2'yX(x B — x*, u B + Cx B ) 



(l ^ Tj || z - 2 + || 2 + 27(2 - Z + , Cx B ) 

(l - ^) 1 2 - 2 + || 2 + 27(2 - Z + , Cx B + Ug) 


(D.5) 

(D. 6 ) 

(D.7) 


where xa £ dom (A),xb £ dom (B),ub £ Bxb and ua £ are defined in Lemma 3.1 and Equation (D. 6 ) holds for all 

x* E Li, while Equations (D.5) and (D.7) hold when x* = Jjb( z *) an d u b = — x*). In particular, when x* = 

we have 


'l 

< Ik - 2 : 


— l) |k- 2+|| 2 + 2^AQa(xa, X*) + 2y\Q B (x B ,x*) + 2yXQ c (x B ,x*) 
* 112 + 27(2 — 2 "*", C(x B ) — C{x*)). 


(D. 8 ) 


Proof First we show inequality (D.5): Let u* A £ Ax* and u* B £ Bx* be such that u* A + u* B + Cx* = 0. Then 


2^A(xa ~ x*,ua ) + 2'y\{x B - x*,u B + Cx B ) 

> 2"/A{xa ~ x*,u A ) + 2'yX(x B - x*,u B + Cx*) + 2'yXQ A {x A ,x*) + 2~/AQ B (x B ,x*) + 2'yXQ c (x B , x*) 

= 7 A (xa — X* ,u* A + u* B + Cx*) + 2 ^ X ( x b — Xa, U* B + Cx*) + 2'yXQ A (x A ,x*) + 27 A Q B (x B ,x*) + 2'yXQ c (x B , x*) 
= 2'yX(x B - xa,u* b + Cx*) + 27 A Qa(xa,x*) + 2-yAQ B (x B ,x*) + 27 A Qc{x B ,x*). 

Now we show Equation (D. 6 ): 


2 A 7 {xa — x* ,ua) + 2~/X{x b - x* ,u B + Cx B ) 

= 27 A (xa - x b ,ua) + 2'yA(x B - x*,UA + u B + Cx B ) 
= 2X(xa ~ x B ,^u A ) + 2X(x b - x*,x B - x A ) 

= 2X(x b - 7 u A -x*,x B - x A ) 


= 2 (2 

+ ( x B - 2 - 711,4) - a: - 

S ,2-2 + ) 

= 2 (2 

- * l ( u B + u a + Cx B ) ■ 

- X* ,2 — 2 + ) + 27(2 — 2 ”*" 

= 2(2 

- -( z - z +)- x *, z - 
X K ’ 

2~*~) + 27(2 — 2 + , Cxb ) 

= 2(2 

* +\ 2 11 

-* ■■*-* )- xll 2 - 

2"*" || 2 + 27(2 — 2"*", Cxb ) 

(1U2) 

II2 — X* || 2 — || 2~*~ — z * ||' 

2 + ( 1 -a)" 2 - 2 + ii 2 + 


Now assume that x * = J - y B ( z *) and show Equation (D.7): 

2 A ~/( x a ~ x *, u A ) + 2 - yX ( x B - x *, u B + Cx B ) 

2 

= 2 (z — x *, z — 2~*~)-1|2: — 2~*~1| 2 + 27(2 — 2 "*", Cxb ) 

A 

2 

= 2(2 — z * ,2 — 2~*~)-1|2: — 2~*~ || 2 +27 (z — 2 ”*”, Cxb + u * B ) 

A 

2 — 2"*” || 2 + 27(2 — 2"*”, Cxb + u b )- 
Equation (D.8) follows from rearranging the above inequalities. □ 


(1-12) I. 

= 2 — 2 


— 2 1 — 2 


+ I" 
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Corollary D.l (Function value bounds) Assume the notation of Proposition D.2. Let f,g, and h be closed, proper and 
convex functions from Pi to (— 00 , 00 ]. Suppose that h is (1/ (3)-Lipschitz differentiable. Suppose that A = df, B = dg, and 
C = V/i. Then if x* = prox (z*), \7g(x*) = (I/ 7 )(z* — x*), and X?f(x*) E df(x*) and \7g(x*) E dg(x*) are such that 
\7h(x*) + \7g(x*) + V/(x*) = 0, we have 


2~f(z — z + ,Vg(x*) + Vh(x*)) + AjXSfixf, x*) + ^'y\S g (x g , x*) + ^\Sh{x g ,x*) 

< 27 ^ (/(^/) + 9{ x g) + K x g) ~ (/ + 9 + h)(x*) + S f (x f ,x*) + S g (x g ,x*) + S h (x g ,x*)) 


H) 

||2 — 2 + || 2 + 

-!) 

|k - 2 + || 2 + 27 (; 


(D.9) 

(D.10) 

(D.ll) 


where Xf E dom(/),a: 5 E dom(g) are defined in Lemma 3.1 and Equation (D.10) holds for all x* E Pi, while Equations (D.9) 
and (D.ll) hold when x* = pro x.^ g (z*) and X7g(x*) = — x*). In particular, when x* = pro x^ g (z*), we have 


_ 1 

< Ik - 


„+l|2 


+ 47 A Sf(xf,x*) + 47 A S g (x g ,x*) + 47 A Sh(x g ,x*) 


+ 27(2 — 2 + , X/h(xg) — Vh(x*)). 


(D.12) 


Proof Equation (D.ll) is a direct consequence of Proposition D.2 together with the inequalities: 


f( x f) + 9{ x g) + K x g) ~ (f + 9 + h )( x *) 

< (Xf - x* ,V f(xf)) + (x g - x*,S7g(x g ) + \?h(x g )) - Sf(xf,x*) — S g (x g ,x*) — S h (x g ,x*)\ 
f( x f) + 9( x g) + H x g) - (/ + 9 + h)(x*) 

> (Xf - X*, Vf(x*)) + ( x g - x*,Vg(x*) + \?h(x*)) + Sf(xf,x*) + S g (x g ,x*) + Sh(x g ,x*) 

= (a -g - Xf, Vg(x*) + Vh(x*)) + (xf - x*, V/(x*) + Vg(x*) + Vii(x*)). 

+ Sf(Xf,X*) + Sg(Xg,X*) + S h (Xg,X*) 

= \{z — z + ,Vg{x*) + Vft(i*)) + Sf(xf,x*) + S g (x g ,x*) + Sh{x g ,x*). 

A 

where we use that x g — Xf = (1/A )(z — z+) (see Lemma 3.1.) 

Equation (D.12) is a consequence of the Equation (D. 8 ). □ 

Corollary D .2 (Subdifferentiable -\- monotone model variational inequality bounds) Assume the notation of Propo¬ 
sition D.2. Let f,g, and h be closed, proper and convex functions from Pi to (— 00 , 00 ], and let V/i be (1/f3h)-Lipschitz. Let 
A , B and C be monotone operators on Pi, and let C be Sc- cocoerc ^ ve - Suppose that A = df -\-A, B = dg-\- B, and C = X7h-\-C. 
Let V/(: va) + = u 2 4 where V/(cc^) E df(xA) and u E Ax ^ 4 . Likewise let \?g(xB) + u -g = ub where Vp/cc#) E dg(xB) 

and u-g E Ax 2 4 . Then for all x E dom(/) n dom(g), we have 


27 a(/(o:a) + g{.XB ) + h{x B ) - (f + g + h)(x) + Sf(x f ,x) + S g (x g ,x) + S h (x g , x) 
+ (XA —X,Vr£) + ( x B — x,u-g+ Cx B ) 


< ||z — x|| 2 — \\z + — rr|| 2 + ^1 — — J ||z — z+ 1| 2 + 27 (z — jzf*", Vh(x^) + Cxb) 

Proof Equation (D.ll) is a direct consequence of Proposition D.2 together with the following inequality: 
f(x f ) + g(x g ) + h(x g ) - (/ + g + h)(x) 

< (xf - x,Vf(xf)) + (x g - x,\7g(x g ) + Vh(x g )) - Sf(xf,x*) - S g (x g ,x*) - S h (x g , x*). 


(D.13) 


□ 
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D.l General case: convergence rates of upper and lower bounds 


We will prove the most general rates by showing how fast the upper and lower bounds in Proposition D.2 converge. Then we 
will deduce convergence rates. Thus, in this section we set 

4(A,*) = \\z k -x\\ 2 - || z k+1 - **|| 2 + ^1 - \\z k - z k+1 \\ 2 + 2j(z k - z k+1 ,Cx |> 

4(A,x*) = II z k - 2*|| 2 - || 2 fe+1 - 2*|| 2 + fl - Lj II z k - z k+1 II 2 + 2 7 ( 2 fe - z k+1 ,Cx I - Cx *) 

where A > 0, z* is a fixed point of T, x* = J 7 s( 2 :*), and x GH. 

Theorem D.l (Nonergodic convergence rates of bounds) Let ( zJ )j>0 be generated by Equation (1.4) with e E (0, 1),7 E 
(0, 2 fie), a = 1/(2 — e) < 2/9/(4/3 — 7 ), and (A j)j>o C (0, 1/n). Let z* be a fixed point of T, let x* = J 7 s(^*), and let x E Li. 
Assume that r := infj>o Aj(l — a\j)/a. Then for all k > 0, 


K k (l,x) < 


4(i.**)< 


2 (|| 2 * - *|| + (l + y/P)\\z° - 2*11+7110**11)112° - z* 


y/x(k + 1 ) 


2(1+ 7/(9)112°-2* 


y/r(k + 1 ) 

We also have the following lower bound: 

,k k * , n ^ -I! z °-^I!II“r + c '**|| 

(xg — X^,Ub + Cx ) > - 


and 

and 


(b:r) = o 

14(1,01 = ° 


1 Jlik + 1 ) 


and 


\(x k B - x\,u* B + Cx *)I = o 


1 + ll^ll 
VkTi 
1 

Vfc +1 


1 

Vk + T 


(D.14) 

(D.15) 

(D.16) 


Proof Fix k > 0. Observe that 


||Csjg|| < ||C*g — C**|| + 110**11 < —||*g — **|| + ||Ca:*|| < — ||2 fe — 2°|| + ||C**|| < —1|^° — 2*|| + ||C* : 


0 


0 ' 


0 


by the (l//3)-Lipschitz continuity of O, the nonexpansiveness of J ~ vB , and the monotonicity of the sequence (|| zl - 2 *ll)j >0 (see 
Part 1 of theorem 3.1). Thus, 


|4(1,*)| (1 = 2) \2(z k+1 - x,z k - z k+1 ) + 2 1 (z k - z k+1 ,Cx%)\ 

< 2|| z k+1 - *||||2° - 2*11 + (2'y/jS)||2° - 2*H 2 + 2'y||C**||||2° - 2*|| 

\Jr(k + 1) 

< (2||2* - *|| + (2 + 2 7 // 3 )|| 2 ° - 2*|| + 2-y||O**||)||2° - 2*|| 

Vz(k + 1 ) 

where the bound in the second inequality follows from Cauchy-Schwarz and the upper bound in Part 7 of Theorem 3.1, and the 
last inequality follows because || 2: fe+1 — :r|| < H-so "* -1 — z* || + \\z* — ai|| < \\z° — z*\\ + \\z* — rr|| (see Part 1 of Theorem 3.1). The 
little-o rate follows because \\z k — z k+1 \\ = o (l /y/k + l) by Part 7 of Theorem 3.1. 

The proof of Equation (D.15) follows nearly the same reasoning as the proof of Equation (D.14). Thus, we omit the proof. 
Next, because x^ — x ^ = z k — Tz k (see Lemma 3.1), we have 


|( 2 fe -T 2 fe ,u| J +C*) 3 >| < 


-Cx* 


y/ T.(k + 1 ) 


by Part 7 of Theorem 3.1. Similarly The little-o rate follows because \\z k — z kJrl || = o (l/y/k + 1) by Part 7 of Theorem 3.1. □ 

We now prove two ergodic results. 
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Theorem D.2 (Ergodic convergence rates of bounds for Equation (1.6) ) Let ( z3 )j>0 be generated by Equation (1.4) 
with e E (0, 1),7 E (0,2 /3e),a = 1/(2 —e) < 2f3/(4f3 — 7 ), and (Xj)j>o C (0,1/a]. Let z* be a fixed point ofT, let x* = J 1 b( z *)j 
and let x E Li. Then for all k > 0, 

(D.17) 
(D.18) 

We also have the following lower bound: 


k 

1 —- < 


(2pe-i) I 


- 47 ||z° — z*||||Cx* 


£i = 0 i=0 

2^2 =0 2 = 0 Z ^2 = 0 


£?= 0 E 


5 ^ 2=0 ^ 2 = 0 

In addition, the following feasibility bound holds: 


1 k 

- —'E, X ii x B ~ X A’ U *B +Cx*) > 


£? =0 E 


£i=0 i=0 

Proof Fix k > 0. We first prove the feasibility bound: 

k 

2^2 = 0 A * 2=0 




2 ||z° - z*| 

Eto^ 


(D.19) 


(D.20) 


“ ELo E 


E^-* i+1 ) 

i=0 


z°-z fe +l|| 2 ||z°-z* 
< 


Eto^ " EtoE 


where the last inequality follows from ||z° — z fe + 1 || < ||z° — z*|| + ||z fc + 1 — z*|| < 2 ||z° — z* |. 

Let = 2/A*. — 1. Note that ///,. > 0, by assumption. In addition, I /r//. = A/t/(2 — Aj,) < \ k /s. Thus, we have 

2'y(z k - z k+1 ,Cx%) = 2 7 (z fc -z k+1 ,Cx k B - Cx*) + 2 1 (z k -z k+1 ,Cx*) 

2 

< Vk \\z k - z k+1 \\ 2 + — \\Cx% — Cx* || 2 + 2-y(z k — z k+1 , Cx*) 

Vk 

Thus, for all k > 0, we have 

k k 

y: K\(\i,x) < || Z° - x \\ 2 + (— Vi\\z t+1 - z% II 2 + 27(2* - 2: i+1 ,C^)) 

=0 

1 + E (^l|Cs* B - Cx *|| 2 + 2 7 {z i - z i+1 ,Cx*/j 


(D ' 21) „ 0 
< z° — x\\ 


< Ik - ill 


< k u -1 


£ 7 ( 2 ^ - 7 /e) 


|z° - z *|| 2 + 2 7 (z° - z k+1 ,Cx*) 


7 


(2/3e - 7 ) 


|z° - z*|| 2 + 4 7 |k°- 2 * 111101 * 11 . 


(D.21) 


where the third inequality follows from Part 4 of Theorem 3.1 and the fourth inequality follows because ||z° — z fe+1 || < 
lk° - z*|| + ||z fe+1 - z*|| < 2||z° - z*||. 

The proof of Equation (D.18) follows nearly the same reasoning as the proof of Equation (D.17). Thus, we omit the proof. 
Finally, Equation (D.19) follows directly from Cauchy Schwarz and Equation (D.20). □ 

Theorem D .3 (Ergodic convergence rates of bounds for Equation (1.7)) Let (z J )j>0 be generated by Equation (1.4) 
with £ E (0,1), 7 E (0, 2/3e), a = 1/(2 — e) < 2/5/ (4/5 — 7 ), and Xj = X C (0,1/a]. Let z* be a fixed point of T , let x* = J 7 b(.z*), 
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and let x £ TL. Then for all k > 0, 

(D.22) 

(D.23) 

We also have the following lower bound: 


(k + l)(k + 2) 

2 


^(i + l)rei(A,x) < 


4 = 0 

k 


2 (2\\z* - x|| 2 + (2 + (2^) lk° - **l| 2 + 10||z° - 2*||||CV 

k + 1 


(k + 1 )(k + 2) i=Q 


X)(i + l)i4(A,**)< 


2 < 1+ (^) 


k + 1 


2 

(fc + l)(fc + 2) 


k 

+ l)(as^ - 4 ,«b +Ci*> > 

i=0 


— 5||z 0 - 2* 
A (fc + 1) 


(D.24) 


/n addition, the following feasibility bound holds: 


2 

(fc+ !)(*: +2) 


fc 


5Z( i + l)(*s — x a) 

i=0 


^ 5||z° - 2* 
- A(fc+ 1) 


(D.25) 


Proof Fix k > 0. We first prove the feasibility bound: 


(■k + l)(fc + 2) i=Q 


1 )( X B - 




(A: + l)(fc + 2) ,_ 0 
2 


(fc + l)(fc + 2) 
2 


(fc + l)(fc + 2) i=Q 
X] ((2 i+1 - 2*) + (i + l)(z* - 2*) - (i + 2)(2 i + 1 - 2*)) 

i=0 

(X> i+1 - 2 ‘) + ( 2 ° - 2 *) - ( fc + 2 )( zfe+1 -**)) 


(fc + l)(fc + 2)§ 1 


2 || 2 ° - 2 * 


2 || 2 ° - 2 * 


(k + l)(fc + 2) 
2 || 2 ° - 2 *|| 


(*+!) 


(k + 2) (k + l)(fe + 2) (fc+1) 

2|l 2° — 2*11 (2 + 5||z° — 2*11 


(fc + 1) 


k+ 1 


(D.26) 


where we use the bound \\z k — z*|| < 11 2 ° — 2 *|| for all k > 0 (see Part 1 of theorem 3.1). The bound then follows because 
A( 0 Cg — x\) = z k — 2 fe+1 for all k > 0 (Lemma 3.1). 
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We proceed as in the proof of Theorem D.2 (which is where r]i := 2/Aj — 1 is defined): 

k r. k 




{k -(- 2)(k + 1) 


i=0 


±: C (« +dii** - *n a -«+DH--+ 1 - -n a ) 

-i— n A 


(fc + 2 )(fc + l) __ 

+ (i + 1) (-Vi\ k i+1 - zil 2 + 2 7 < Z i - z i+1 ,Cx\ 5 ) 
2 


< 


(fc + 2)(fc + 1) i=0 

( D . 21 ) / 


X) ((lk I+1 - z || 2 + (i + l)||z l - x\\ 2 - (i + 2 )|| 2 l+1 - xf) 

1—n A 


+ (i + 1) (-—||Cx^ — Cx *|| 2 4- 27 ( 2 * — 2 l+1 , Cx ■*) 


(fc + 2)(fc + 1) i=Q 


y 11 z i — x|| 2 h—-— y —— ncx^ — Cx* 

" k + 2^-' £ " B 11 

1=0 


y2-y(i + l)(z i -z i + 1 ,Cx*) 


(fc + 2 )(k + 1) ,_ 0 

k 

< 7 —%— y ( 2 i^ i - 2 *n 2 

- (fc + 2 )(* + l) 

(D. 26 ) 20 7 ||2° - z* || || Cx* || 


27 11,0 _ _*||2 

+ 2 ||2*-Xf)+ <2 ^)" " 

" " 7 k+2 


k + 1 


2 ( 2 ||2* - z || 2 + (2 + ||z° - 2 *|| 2 + 10||2° - 2»||||Cx < 

k + 1 

The proof of Equation (D.23) follows nearly the same reasoning as the proof of Equation (D.22). Thus, we omit the proof. 
Finally, Equation (D.24) follows directly from Cauchy Schwarz and Equation (D.25). □ 


D.2 General case: Rates of function values and variational inequalities 

In this section, we use the convergence rates of the upper and lower bounds derived in Theorems D.l, D.2, and D.3 to deduce 
convergence rates function values and variational inequalities. All of the convergence rates have the following orders: 


Nonergodic: o 


y/k + 1 


and 


Ergodic: O 


k + 1 


We work with three model problems. 

— Most general: A = df + A, B = dg + B and C = Vh + C where /, g and h are functions and A, B and C are monotone 
operators. See Corollary D.2 for our assumptions about this case, and see Corollary D.4 for the nonergodic convergence 
rate of the variational inequality associated to this problem. Note that for variational inequalities, only upper bounds are 
important, because we only wish to make certain quantities negative. 

— SubdifFerential + Skew: We use the same set up as above, except we assume that A and B are skew linear mappings 
(i.e., A* = —A and B* = —B) and C = 0. See Corollaries D.6 and D.8 for the ergodic convergence rate of the variational 
inequality associated to this problem. This inclusion problem arises in primal-dual operator-splitting algorithms. 

— Functions: We assume that A = B = C = 0. See Corollary D.3 for the nonergodic convergence rate and see Corollaries D.5 
and D.7 for the ergodic convergence rates of the function values associated to our method. 

Note that by [24, Theorem 11], all of the convergence rates below are sharp (in terms of order, but not necessarily in terms 
of constants). In addition, they generalize some of the known convergence rates provided in [24,22,23] for Douglas-Rachford 
splitting, forward-Douglas-Rachford splitting, and the primal-dual forward-backward splitting, Douglas-Rachford splitting, and 
the proximal-point algorithms. 

The following fact will be used several times: 
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Lemma D.l Suppose that (z J )j> q is generated by Equation (1.4) and 7 £ (0,2/3). Let z* be a fixed point of T and let 
x* = Then (x J A )j> 0 and (x 3 B )j> 0 are contained within the closed ball B(x*,( 1 -\- 7 /(3)\\z° — 2 *||). 

Proof Fix k > 0. Observe that 

\\x% - X* || = || J^ B (z k ) - Jy B (z*)\\ < || z k - Z*|| < || 2 ° - 2 *|| 
by Part 1 of Theorem 3.1. Similarly, 

\\ X A ~ x *\\ — ||refl 7 s(z fc ) — refl 7 s(z*) + 7 Cx* — 'yCxsW < || z k — z*\\ + — z*\\ < ^1 + ^ 

□ 



Corollary D .3 (Nonergodic convergence of function values) Suppose that (z J )j>o is generated by Equation (1.4), with 
A = df,B = dg and C = V/i. Let the assumptions be as in Theorem D.l. Then the following convergence rates hold: 

1. For all k > 0, we have 


— \\z v — z 


Cx* 


\/ r(fc +1) 


< f(xp + g(x k ) + h(x k ) — {f + g + A)CO 

< (in* - 01 + (i+7//01N 0 - z*ii + 7 iivfe(^* 

l\Zl(k + 1 ) 


and 

l/(Z/) + S(Sg) + H x g) - (/ + 9 + h)(®*)| = 0 (^==) ■ 


2. Suppose that f is L-Lipschitz continuous on the closed ball 13(0, (1 + 7 /(d)\\z® — z*||). Then the following convergence rate 
holds: 

0 <f(Xg) + g(Xg) + h(x k ) - (/ + g + h)(x*) 

< (IN* - x*|| + (1 + 7 //OII 2 0 - 2 *|| + 7 || Vfe(a;*)||)|| 2 0 - 2 *|| + 7 L|| 2 0 - 2 *|| 

7 \Zl{k + 1) 

and 

0 < f(x k g) + g(x k g) + h(x k ) - (/ + g + h)(x*) = o (y==) ■ 

Proof Fix k > 0. 

Part 1: By Corollary D.l, we have 

(x k B - x\,u* B + Cx*) < f(x k ) + g(x k ) + h(x k ) - (/ + g + h)(x*) < f-K,\(l,x*) 

Thus, the convergence rates follow directly from Theorem D.l. 

Part 2: Note that f(x k ) — f(x k ) < L\\x k — x k \\ by Lemma D.l. Because Xf — x g = z k — Tz k , we have 

f(Xg) + g(x k ) + h(x k ) - (/ + g + h)(x*) < f(x k ) + g(x k ) + h(x k ) - (/ + g + h)(x*) + L\\x k - x k \\ 

(7) , , II ,0 _ ,* || 

< f{Xf) + 9(Xg) + H*g) - U + g+h){x*) + " = ■ 

#+l) 


Thus, the rate follows by Part 1. □ 

Corollary D .4 (Nonergodic convergence of variational inequalities) Suppose that (z J )j> 0 is generated by Equa¬ 
tion (1.4), with A = df + A, B = dg + B and C = V/i + C as in Corollary D.2. Let the assumptions be as in Theorem D.l. 
Then the following convergence rates hold: 
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1. For all k > 0 and x £ dom(/) n dom(g), we have 

f( x \) + 9( x b) + h( x %) — (/ + g + h){x) + (x A — x , u^-) + {x B — x, + Cx B ) 

< (||z* - x\\ + (1 + 7//9)||2° ~ z*\\ + 7 ||Cx*||)||z° - z*\\ 

7 \/r( fc +1) 

A Suppose that f and A are Lj and L A -Lipschitz continuous respectively on the closed ball B( 0, (1 + -y//?)|| z° — z*||). Then 
the following convergence rate holds: For all k > 0 and x £ dom(/) n dom(g), we have 

f ( x %) + 9( x b ) + M x b) ~ (/ + 9 + /*)(*) + — x > Ax b + + C^b) 

^ (Ik* - x*|| + (1 + 7/fll|z° - *1 + 7||Cx*||)|| g ° - - z*ll 

7 \/l( fc + 1) 

(1 + L a )\\z° - z* || ((1 +7//3)(l + L A )||2° - z*|| + II Ax* II + ||x* - x|||) 

\/r(fc + 1) 

and 

f( x B ) + 9( x b ) + M x b) ~ (/ + £ + ^X®) + ( x s — ^ — ° ^ ^ |J_ • 


Proof Fix A; > 0. 

Part 1: By Corollary D.2, we have 

f( x A ) + + h(x%) ~ (/ + £ + ^00*0 + (^A — u “^) + ( x b — ^ 5 + C x b) < — «i ( 1 , #) 

Thus, the convergence rates follow directly from Theorem D.l. 

Part 2: Note that f(xg) — f(x < Lj||rr^ — a^H by Lemma D.l. Because x^ — x ^ = z k — Tz k , we have 

f( x B) + 9( x b) + ^( x b) ~ (/ + 9 + h)(#) < f(x\) + g(xg) + h(x^) — (/ + <7 + /i)(#) + I//||a^ — a^ 

(7) , II ~0 _ *11 

< f( x \) + 9 ( x b) + h( x %) — {f + 9 + h)(x) H , • • 

V + 1) 


Also, 


and for 2 ;* = 


- x > Ax a) 

= (x k A -x k B ,Ax k A ) 
= {x k A -x k B ,Ax k A )- 


(x% -x,Ax k A ) 

(x^ — x , Aa^ — Axg) + (rc^ — #, Ax^) 


< ||x^ - x|||||AxJy + ||®| -x||||Aa;^ - Ax||| + (x| - x,Ax k B ) 


W (1 + L^)|k ° - z* 


IlMH 


+ (xjg - x, Ax^) 


11 Ax; 


y/r(k + 1 ) 

— x|| < ||Ax^ — Ax*|| + ||Ax*|| + ||x| — x*|| + ||x* — x|| 
< (1 + 7/|S)(l + La)\\z° — z*|| + || Ax* || + ||x* - x| 


Thus, the rate follows by Part 1. □ 

Corollary D.5 (Ergodic convergence rates of function values for Equation (1.6)) Suppose that (z J )j> q is generated 
by Equation (1.4), with A = df,B = dg and C = Vh. Let the assumptions be as in Theorem D.2. For all k > 0, let 
x k = (1/JT=o ^i xl f’ an d ^ x g = (l/Ei=o Xa=o ^ i x g■ Let x* = J 7 b(-2*)- Then the following convergence rates 

hold: 
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1. For all k > 0, we have 

— 2|| 2 ° - z* 


Cx 


E;=0 


1 < fix'}) + g(xg) + h(x k B ) -if + g + h)(x*) 

"~° -^ll 2 + J2^)W Z ° - ^‘ll 2 +47lk° - *-||||Oa;*|| 


< 


27Ef=o^ 


2. Suppose that f is L-Lipschitz continuous on the closed ball 5(0, (1 + ^/j3)\\z Q — z*||). Then the following convergence rate 
holds: 

o <f{x k a) + g{x k q ) + h(x k ) - (f + g + h)(x*) 


9 

v,* || 2 


(2/3e-7) 1 


1 +4 7 ||z° -z*||||Ca:*|j +4 7 L||z° -z*j| 


27 Etc Ai 


Proof Fix k > 0. 

Part 1: We have the lower bound: 


f(x k ) + g(x k ) + h(x k ) -(} + g + h)(x*) 

> ( x k f - x*. V/(x*)> + (x k - x*,Vg(x*) + Vh(x*)) 
= (x k -x k ,Vg(x*) + Vh(x*)). 

where Vg(x*) + V/(x*) + V/i(a:*) = 0. In addition, 


/(®/) + 5(^g) + h ( x g) ~ if + g + h)ix*) < 


9 v-fc X 
^7 l^i=Q Ai 2=0 


by Jensen’s inequality and Corollary D.l. Thus, the convergence rate follo ws by Theorem D.2. 

Part 2: Note that f(x k ) — f{x k ) < L\\x k — x k \\ by Lemma D.l because B{x*, (1 + 7//3)|| z° — z* 
sequences (xf)j>o and (x g )j> o must continue to lie in the ball. Therefore, 


is convex so the averaged 


(D.20) 


fi x g) + gi x g) + H x g) ~ if + g + h)(x*) < fix)) + gix k g ) + hix k g ) - if + g + h){x*) + L||x* - x k g " 

o 


< fi x f) + gix g ) + h(x g ) - if + g + h)ix*) + 


2||z u - z*\ 

EtoE 


Thus, the rate follows by Part 1. □ 

Corollary D.6 (Ergodic convergence of variational inequalities for Equation (1.6)) Suppose that (z J )j >q generated 
by Equation (1.4), with A = df -\- A, B = dg + B and C = V/i + C as in Corollary D.2. In addition, suppose that A and B are 
skew linear maps (i.e., A* = —A, and B* = —B), and suppose that C = 0. Let the assumptions be as in Theorem D.2. For all 
k > 0, let x ^ = (1/ yi T -_q Aj) Si=o A i x \, and let x^ = (1/ A*) A iX z B . Then the following convergence rates hold: 

1. For all k > 0 and x E dom(/) Pi dom(g), we have 

f( x A.) + 9{ x b) + ^( x b) ~ (/ + 9 + h)(x) + (— x, Ax ^ + Bx\) 


(2/3e—7) 1 


! +4 7 ||z°-z*||||Cx*||+4 7 p|||M| 


27Ei=0^ 


2. Suppose that f is Lf-Lipschit.z continuous on the closed ball 5(0, (1 + 7 //3)|| z° — z*||). Then the following convergence rate 
holds: For all k > 0 and x £ dom(/) n dom(g), we have 

fi x B ) + gi x B) + K x b) ~ if + g + h)ix) + i~x, Ax.g + Bx-g) 


( 2 / 3 e—7) I 


; +4 7 ||z° - z*\\\\Cx*\\ + 4 7 ||A||||x||||||z° - z*|| + 4 7 L / ||z° - z* 


27Eto^ 
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Proof Fix k > 0. 

Part 1: By Corollary D.2, we have 


I( x a) + 9( x b ) + M x b ) — (/ + 9 + + ( — Ab x b + & x b) 

< -t- y^XjK^Xux) + (x,A (x k B - x}Q) 


2 7 Ei=0 E i—0 

(D.20) ! * , 

— 9 \pfc — 2 J A*fCi(A*,as) - 

2 7 Ei=o ^ i—0 


2 ||A||||x||||z° -z* 

Eto^ 


where we use the self orthogonality of skew symmetric maps (( Ay,y ) = ( By,y ) = 0 for all y £ ft) and Jensen’s inequality. 
Thus, the convergence rates follow directly from Theorem D.2. 

Part 2: Note that f(x B ) — f(x A ) < Lf ||x^ — ®^|| by Lemma D.l. Therefore, 

f( x %) + 9( x %) + h( x %) — (/ + g + ft)(x) < f(x A ) + j(i|) + h(x B ) — (f + g + h){x) + Lf\\x A — x^\\ 

(D.20) 2L f llz° — z* II 

< /(*i) +9($fl) + K x b) - if+ 9 + h)(x) H-— ¥ ---. 

Z^i=o ^ 


Thus, the rate follows by Part 1. □ 

Corollary D.T (Ergodic convergence rates of function values for Equation (1.7)) Suppose that (z J )j >q is generated, 
by Equation (1.4), with A = df,B = dg and C = S/h. Let the assumptions be as in Theorem D.3. For all k > 0, let 
x k j = (2/((& + l)(/c + 2))) + l)a^, and let x k = (2/((fc + l)(/c + 2)) + l):r* . Let £* = J^b( z *)- Then the following 

convergence rates hold: 

1. For all k > 0, we have 


-b\\z° - z*\\ 
A(fc + 1) 

2 ||z* — x* 
< - 


< /(*)) + g{x k ) + h{x k ) -(/ + <? + h)(x*) 

I 2 + ( 2 + lk° - **ll 2 + 10 ||*° - z*||||Cx* 

7 A (k -h 1 ) 


2. Suppose that f is L-Lipschitz continuous on the closed ball B( 0, (1 + 7 // 3 )|| z° — z*||). Then the following convergence rate 
holds: 


0 <f( x g) + g(x k ) + h{x k ) - (/ + g + ft)(a:*) 

2||z* - a:* || 2 + (2 + ||*° - **|| 2 + 10||z° - **||||Cx*|| + 5-yL f \\z° - z* 

7A(/c + 1) 


Proof Fix fc > 0. 

Part 1: We have the lower bound: 


f(x k ) + g(M k ) + h(x k g ) -(/ + <? + h)(x*) 

> (x) - x*, V/(x*)> + (x* - x*, V ff (x*) + Vft(x*)) 
* (x*-x),V S (x*) + Vft(x*)>. 

where Vg(x*) + V/(x*) + Vft(x*) = 0. In addition, 


k 

NT 


fix)) + g(x£) + h(**) - (/ + 9 + ft)(x*) < 27A(fc+1)(fc + 2) Z _7 + **) 


by Jensen’s inequality and Corollary D.l. Thus, the convergence rate follows by Theorem D.3. 
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Part 2: Note that f(x k ) — f(x k ) < L||a;* — x k \\ by Lemma D.l because B(x*, (1 'y//3)\\z° — z*||) is convex, so the averaged 
sequences (xf)j >q and ( x g)j>o must continue to lie in the ball. Therefore, 


/(s*) + ®(*J) + h(x$) -(f + g + h)(x*)< f(x k ) + g(x k ) + h{x k ) -{f+g + /»)(**) + L\\x) - x k g 


(D.20) 


< f(xf) + 9 (x g ) + h(x g )-(f + g + h)(x*) + 


5L||z° - z*\\ 
A(fc + 1) 


Thus, the rate follows by Part 1. □ 

Corollary D.8 (Ergodic convergence of variational inequalities for Equation (1.7)) Suppose that (z J )j>o is generated 
by Equation (1.4), with A = df + A, B = dg + B and C = V/i + C as in Corollary D.2. In addition, suppose that A and B are 
skew linear maps (i.e., A* = —A, and B * = —B), and suppose that C = 0. Let the assumptions be as in Theorem D.3. For 
all k > 0, let x ^ = (2/ ((k + l)(fc + 2))) + 1 ) an d x % = (2/((A; + l)(/c + 2))) + ^) x% b * Then the following 

convergence rates hold: 

1. For all k > 0 and x E dom(/) fl dom(^), we have 

I( x a) + 9( x b) + ^( x b) ~ (/ + 9 + + ( — #> Ax^ + Bxg) 

2\\z* -x\\ 2 + (2+ (2/ J_ 7) ) ||z° - z*|| 2 + 10||z° -z*||||Ca:*|| + 57||A||||x||||||z° -z*|| 

7A(/c + 1) 


,2. Suppose that f is Lf -Lipschitz continuous on the closed ball B{ 0, (1 + 7 // 3 )||z° — z*||). Then the following convergence rate 
holds: For all k > 0 and x E dom(/) Pi dom(g), we have 

f( x B) + 9( x b) + h( x %) — (/ + g + h)(#) + (—+ Bx^) 

2\\z* - x\\ 2 + (2 + (2/ J_ 7) ) ||z° - z*|| 2 + 10|| 2 ° - z*||||Cx*|| + 57||A||||x||||||z° - z*|| + 57 Z 7 IIZ 0 - z*\\ 

^\(k + 1 ) 


Proof Fix fc > 0. 

Part 1: By Corollary D.2, we have 


/(4) + 90s) + fe (4) - (/ + 9 + /i)(x) + (-x, A s x| + Bx|) 
2 ^ 

< —-^--^(i+l)4(A,x) + <z,A(x!-4)> 


2 7 A(fc + l)(fc + 2) ^ 


(D. 25 ) 

< 


27A(fc + l)(fc + 2) i=0 


^ (A, x) + 


5||Al|l|x||||z° -z* 
A(fe + 1) 


where we use the self orthogonality of skew symmetric maps (( Ay,y) = ( By,y ) = 0 for all y S 'hi) and Jensen’s inequality. 
Thus, the convergence rates follow directly from Theorem D.3. 

Part 2: Note that f(xg) — f(x\) < L/||x^ — xjg|| by Lemma D.l. Therefore, 


f( x %) + g( x %) + h(xg) — (f + g + h)(x) < f(x A ) + g( x g) + h(x B ) — (f + g + /i)(x) + L/Hx^ — x%\\ 


(D.25) 

£ I( x a) + 9( x b) + h(x B ) — (/ + <? + h)(x) + 


5Ly||z° - z* 
A(fc + 1) 


Thus, the rate follows by Part 1. 


□ 
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D.3 Strong monotonicity 


In this section, we deduce the convergence rates of the terms Q.(-, •) under general assumptions. 

Corollary D .9 (Strong convergence) Suppose that q is generated by Equation ( 1 . 4 ). Let z* be a fixed point of T 

and let x* = Jjb( z *)- Then for all k > 0, the following convergence rates hold: 

1. Nonergodic convergence: Let the assumptions of Theorem D.l hold. Then 


Q A {x k A ,x*) + Qb(xb>x*) + Qc( x %, x *) < 


(l + 7 // 3 )|| 2 ° -z *\\ 2 

7t/t(/c + 1) 


and Qa( x A ’ x *) + Q B ( X %, X *) + Qc( x c > x *) = ° (l/Vfc + 1) • 

2. “Best” iterate convergence: Let the assumptions of Theorem D.l hold. Suppose that A := infj>o A j. Then 


min {Qa(x 1 a ,x*) + Q B {x l B ,x*) + Q c ( x b, x *)} < 
i= 0 ,- - ,fc 


i + 


( 2 / 9 e- 7 ) / 


27 A(fc + 1) 


and min i=0> ... jfc {Qa( x a , x *) + Qb{ x b’ x *) + Qc( x c’ x *)} = o(l/(k+ 1 )). 

3. Ergodic convergence for Equation ( 1 . 6 ): Let the assumptions for Theorem D.2 hold. Then 


f i -j_ 7 

V' \ (Qa( x A’ x *) + Qb( x B’ x *) + Qc{ x %, x *j) < - (2/j " 


y^/c \ Z—/ 2 \ A -o’ ' ^ ° v o’ /y — q y^/c \ 

zZi=o i=o ^7 Z-/i=o A 

4 - Ergodic convergence for Equation ( 1 . 7 ): Let the assumptions for Theorem D.3 hold. Then 


(k + 1 )(k + 2 ) i=Q 


K 

y~!(» + l) ( Qa( x a> x *) + Qb( x %, x *) + Qci x C ’ x *)) < 


( 1+ 


(2/3e — 7 ) 


7A(/c + 1) 


Proof The “best” iterate convergence result follows [24, Lemma 3] because a{ x \-> x *)+Qb( x bi x *)~^Qc( x1 b^ #*)) < 

ESo 27 A<(Qa(®^j + Qb(^’ x *) + Qc(^b>^*)) < ESo Ai« 2 (Ai,^*) < (1 + 7 /( 2 / 5 e - 7 )) by the upper bounds in Equa¬ 

tions (D. 8 ) and (D.18). 

The rest of the results follow by combining the upper bound in Equation (D. 8 ) with the convergence rates in Theo¬ 
rems D.l, D.2, and D.3. □ 

At first glance it may be seem that the ergodic bounds in Theorem D.9 are not meaningful. However, whenever pa > 0, 
we can apply Jensen’s inequality to show that 

k 

yZ u jQA( x A* x *) ^ PA 

4=0 


£« 


for any positive sequence of stepsizes (vj)J=o» such that i/» = 1. Thus, the ergodic bounds really prove strong convergence 

rates for the ergodic iterates generated by Equations (1.6) and (1.7). 


D.4 Lipschitz differentiability 

In this section, we focus on function minimization. In particular, we let A = df , B = dg , and C = V/i, where f,g and h are 
closed, proper, and convex, and Vh is (l//3)-Lipschitz. We make the following assumption regarding the regularity of /: 

The gradient of at least one of / is Lipschitz. 

Under this assumption we will show that 

the “best” objective error after k iterations of Equation (1.4) has order o(l/(k + 1))). 

The techniques of this section can also be applied to show a similar result for g. The proof is somewhat more technical, so we 
omit it. 

The following theorem will be used several times throughout our analysis. See [3, Theorem 18.15(iii)] for a proof. 
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Theorem D .4 (Descent theorem) Suppose that f \'H —¥ (— 00 , 00 ] is closed, convex, and differentiable. //V/ is (1 //3f)~ 
Lipschitz, then for all x,y G dom(/), we have the upper bound 


f(x) < f(y) + {x- y,Vf(y)) 



(D.27) 


Proposition D .3 (Lipschitz differentiable upper bound) Suppose that (z J )j>o is generated by Equation ( 1 . 4 ). Then the 
following bounds hold: Suppose that f is differentiable and V/ is ( 1 / (3f) -Lipschitz. Then 


27 K(f + 9 + h)(x g ) — (f + g + h)(x*)) 


\\z — z*\\ 2 — II 2 + — z*\\ 2 + ( 1 


< 


+2-y(' s ^h(x g ) — ^7h(x*),z — 2 +) 
( 1 + "l -Pf . 

V + . 2/5 f 




2 — 2 — 2 n — 2 


+ - r*ll 2 . 


if 7 < Pf 

z - z+ 1| 2 ) 

+27 ( x + Tjy L ) ( Vfe ( x s) ~ 2 — 2+> if 7 > /?/. 


(D.28) 


Proof Because V/ is (l//3y)-Lipschitz, we have 


(D. 27 ) 

f(x g ) < f(xf) + (x g -Xf, V/(xy)> 
Sf(x f ,x*) ( > 1) ^-||V/(x/) — V/(x*)|| 2 . 



(D.29) 

(D.30) 


By applying the identity z* — x* = 7 Vg(x*) = — 7 V/(x*) — 7 V/i(x*), the cosine rule (1.12), and the identity 2 — 2 + = A (x g —Xf) 
(see Lemma 3.1) multiple times, we have 


2 (2 - 2 + , 2 * - x*) + 27 A (xg - Xf, V/(x/)> 

= 2A(x g — Xf,yVg(x*) +7 V/(x/)) 

= 2 A( 7 Vg(x s ) + 'y’Vh^Xg) + yV f (x f ), yV f (x f ) - 7 V/(x*)) - 2(2 - 2 +, 7 Vh(x*)) 

= a(|| 7 V/(x / ) - 7 V/(x*)f + ||x 3 - x /|| 2 

- ll7Vg(xg) + 7 Vh(xg) - 7 Vg(x*) - 7 Vh(x*)|| 2 ) - 2(2 - 2 +, 7 Vh(x*)). (D. 31 ) 


By Lemma 3.1 (i.e., 2 — 2 + = A(x g — xy)), we have 




(7~/3/) \ 

/3/A ; 
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Therefore, 


27-M(/ + 9 + h){x g ) - (/ + 9 + h)(x*)) 

( D .29) 

< 27 A {f(x s )+g(x.g) + h(x g ) - (/ + g + h)(x )) + 2'y\(x g - s/, V/( x f )) + -^-\\x g - x f \\~ 

Pf 

(D.ll) 

< || 2 : — z* || 2 — || 2 :”*" — z* || 2 + 2 (z — z+, z* — x*) + 27 A (x g — Xf, V f(xf)) 

+ ^1 — ^ \\z + - z\\ 2 + 2^(Vh(x g ), z — z + ) + -j ^\\ x 9 — x f II 2 _ 27 A Sf(xf,x*) 

< II 2 - 2 * II 2 - \\z + - 2 * II 2 + (l - fj \\z - z + \\ 2 + A (-j- + l) \\x g - Xf\\ 2 
+ A|| lVf{x s ) - 7 V/(z *)|| 2 + 27 (Vh(x g ) - S 7 h{x*),z - 2 +) - 2 1 \S s {x s ,x*) 

s' II * || 2 || + * || 2 - (1 1 (7 — Pf)\ || +m2 

< |p — 2: || — \\z^ — z || + ^1 h — p ^ j |p — z^\\ 

(D.30) 

+ 2 7 <V/i( % ) - Vh(x*),z - z+) + 7 A (7 - Pf)\\ V/(*/) - V/(z*)l| 2 - 
If 7 < Pfi then we can drop the last term. If 7 > /3j, then we apply the upper bound in Equation (D.12) to get: 


(D.32) 


7A(7-0/)l|V/(s / )-V/(x* 


( 7 -/ 3 /) 

2/3/ 


2 - 2*l| 2 - l|z + - -*H 2 + ( 1 - T I - 2 


+ 27 (Vh(x g ) — Vh.(x*), z — 2 + ) j . 


The result follows by using the above inequality in Equation (D.32) together with the following identity: 



(7 ~Pf) \ 

Pf* J 


(7 ~ Pf) 
2 Pf 



1 + 


7 -Pf 
2/3/ 


□ 


Theorem D .5 (“Best” objective error rate) Let ( zJ )j> 0 be generated by Equation (1.4) with 7 E (0,2/3) and r = 
infj>o Aj(l — aXj)/a > 0. Then the following bound holds: If f is differentiable and V/ is (1/ (df)-Lipschitz, then 

0 < , = min {(/ + g + h)(x l g ) - (/ + 9 + h)(x*)} = o ^7-,-) . 

Proof By [24, Part 4 of Lemma 3] It suffices to show that all of the upper bounds in Proposition D.3 are summable. In both of 
the cases, the alternating sequence (and any constant multiple) (\\z^ — z*\\ 2 — — z *\\ 2 )j >0 is clearly summable. In addition, 

we know that {\\z^ — z^ +1 || 2 )j>o is summable by Part 1 of Theorem 3.1, and every coefficient of this sequence in the two upper 
bounds is bounded (because (Aj)j>o is a bounded sequence). Thus, the part pertaining to (|| 2 ^ — z^ +1 || 2 )j>o is summable. 

Finally, we just need to show that (( \7h(x J g ) — X7h(x*), z^ — zi+ 1 ))j>o is summable. The Cauchy-Schwarz inequality and 
Young’s inequality for real numbers show that for all k > 0, we have 

2{Vh(x k ) - Vh(x*),z k -z k+1 ) < \\Vh(x k ) -Vh(x*)\\ 2 + \\z k - z k+1 1| 2 . 

The second term is summable by the argument above, and the first term is summable by Part 4 of Theorem 3.1. □ 

Remark D.l The order of convergence in Theorem D.5 is sharp [24, Theorem 12], and generalizes similar results known for 
Douglas-Rachford splitting, forward-backward splitting and forward-Douglas-Rachford splitting [24,23,25]. 
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D.5 Linear convergence 

In this section we show that 


Equation (1.4) converges linearly whenever (pa + pb + pc)(1/La + 1/Lb)>0 


where La and Lb are the Lipschitz constants of A and B and we follow the convention that 1 /La = 0 or 1/Lb —0 whenever 
A or B fail to be Lipschitz, respectively. 

The first result of this section is an inequality that will help us deduce contraction factors for T in Theorem D. 6 . 


Proposition D .4 Assume the setting of Theorem 3.1. In particular, let e £ ( 0 , 1 ), let 7 £ ( 0,2 f3e), let a = 1/(2 — e), and let 
A £ (0,1 /a). Let z £ Pi and let z+ = (1 — A )z + A Tz. Let z* be a fixed point of T and let x* = J 7 b(z*). Let xa and xb be 
defined as in Lemma 3.1. Let QaiQb an d Qc be defined as in Proposition D.l. Then the following inequality holds: 


11*+ - **n a +(i - i) i»- 

— —— \\Cx B — Cx* || 2 

£ 

< ||2 - Z *|| 2 

< min / (1 + 7 L b ) 2 \\x b - x* 
3(l + 2 1 2 L 2 B )(\\x A -x*\\ 2 + 
4((1 + 2 7 2 L^) \\x b -x*\\ 2 + 


z + \\ 2 + 27 A Q a (x a ,x*) + 27 A Q b (x b ,x*) + 27 A Qc(x b ,x*) 


II 2 , 3 ((1 + 7-La ) 2 II®a - ®*H 2 +7 2 |l Cx b - Ca ;*|| 2 + 4||® s - x A 
|®A - ®bH 2 ) , 

7 2 || Cx b ~ Cx*\\ 2 + (l + 2 1 2 L 2 ^) ||a: s - ®aI| 2 ) }• 


2 ) 


(D.33) 


Proof Equation (D.33) shows that: 



z - z + \\ 2 + 27 A Qa(%a,x*) + 27 A Qb(%b,x*) + 2'y\Qc(xb,x*) 


< ||z — z* || 2 + 27 (z — z~*~, Cxb — Cx*). 


From Cauchy-Schwarz and Young’s inequality, we have 


27 (z — z~*”, Cxb — Cx*) < — \\z — z~*~ || 2 


\ 

— \\Cx B - Cx*\\ 2 


The lower bound now follows by rearranging. 
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The upper bound follows from the following bounds (where we take Lb = oo or La = oo respectively whenever A or B fail 
to be Lipschitz): 

II z - z*\\ 2 = \\x B +7 u B - ( X* +7 u s)H 2 < (i + lL B ) 2 \\x b - x*\\ 2 ; 

II* ~ **|| 2 = \\x B +7 u B - (x* - 7 u* A - c/Cx*)\\ 2 

= I \x B - 7(«A + Cx B ) + 7 (u B +u A + Cx B ) - ( x* - 7 u* A - ^Cx*)\\ 2 

< II x A ~ l(u A + Cx B ) + 2 {x b - x A ) - (; x* - r yu A - 7 Cx *)|| 2 

< 3 (|| 1,4 - 7“A - (x* - 7 «a)II 2 + 1 2 \\Cx b - Ca :*|| 2 + 4||® fl - x A \\ 2 ) 

< 3 ((1 + 7 ^a ) 2 \\x A - i *|| 2 + 7 2 ||Co ;s - Cx* || 2 + 4||® s - a: A || 2 ) ; 

II* - **|| 2 = \\ X B +7“s - (s* +7«s)H 2 

= \\ X A + 7 “s - (x* + 7 U S ) + (x B - x A )\\ 2 

< 3 (IIa:A x *|| 2 +7 2 ||«b - u* b \\ 2 + \\x A - ®b|| 2 ) 

< 3 (\\x A - x*\\ 2 +'y 2 L%\\x b - x*\\ 2 + ||®a ~ x B \\ 2 ) 

< 3(1 + 27 2 L|) (||®a - z *|| 2 + ||®a - aj s || 2 ) ; 

111* - **|| 2 = ||xs +7“s - {x* +7 u s)ll 2 

= ||® s - 7 (u A + Cx B ) - (®* - 7 (u A + Cx*)) + (x B - x A )\\ 2 

< 4 (||®B - a;*II 2 + 7 2 ||“a - U* A \\ 2 + 7 2 ||Co: b - Cx *|| 2 + ||® s - x A || 2 ) 

< 4 (||® s - ®*|| 2 + 'y 2 L 2 A \\x A - ®*|| 2 + 7 2 ||Ca; s - Caa *|| 2 + ||® fl - x A \\ 2 ) 

< 4 ((1 + 2 7 2 Li) ||®s - x*\\ 2 + 7 2 ||Cx s - Cx *|| 2 + (1 + 2"/ 2 L\) \\x b - x A \\ 2 ) . 

□ 


The following theorem proves linear convergence of Equation (1.4) whenever (p ^4 + Pb + /ac)(1/Aa + 1 /Lb) > 0. 

Theorem D .6 Assume the setting of Theorem 3.1. In particular, let e £ (0,1), let 7 £ (0,2/3e), let a = 1/(2 — e), and let 
A £ (0, l/o). Let z £ Ti and let z + = (1 — A )z + A Tz. Let z* be a fixed point of T and let x* = J^b( z *)- Then the following 
inequality holds under each of the conditions below: 


z+ - z*|| < (1 - C(A )) 1/2 ||z - z* 


where C( A) £ [0,1] is defined below under different scenarios. 

1. Suppose that B is Lb-L ipschitz, and pb strongly monotone. Then 

C(A) = 2LbiX 2 . 
(1 + 7 L b ) 

2. Suppose that A is L A -Lipschitz and fi A -strongly monotone. Then 


C( A) = — min < - 
3 \l 


2ma7 


A ( 1 


. (1 + 7-La ) 2 ’ 4 V«A 
3. Suppose that A is pa strongly monotone and B is Lb-L ipschitz. Then 

A 


). 77 - 


cm- 


min <j 27 /iA, A ( — - 1 


3(1 + 2 7 2L|) 

4- Suppose that A is L^-Lipschitz and B is pB-strongly monotone. Then 


C( A) = — min 


27 PB 2/3-7/e 


(l + 27-L^)’ 7 ’ (l + 27 2 -L^) 


1 
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5. Suppose that A is L A-Lipschitz and C is pc- s ^ ron 9^y monotone. Let 77 E (0,1) be large enough that 2r)fd > 7 / 5 . Then 


C( A) = — min 


A . f 27/i C (l - ?y) 2/3- 7 /e A / 1 


— -1 


4 \(1 + 2 7 2L^)’ 7 ’ (1 + 2 7 2L^) W 

6. Suppose that B is Lg-Lipschitz and C is pc -strongly monotone. Let i] £ (0,1) be large enough that 2r//3 > 7 /e. Then 

z~ii \ \ 27/i C (l-r)) 

C(A) - (1 + 7 i B ) 2 

Proof Each part of the proof is based on the following idea: If ao, • • • , a n , bo, • • • , 6 n , co, • • • , c n G R++ for some n > 0, and 

n n 

\\ z+ - 2 *n 2 + Y“ iCi ^ ii 2 - 2 *n 2 ^ Y aibi > 

2=0 2=0 

then Y^i=o a i b i < max{ 6 j/cj | £ = 0 , - ■ • , n} £]" = o a i°i 7 so 

n 

\\z + — z* || 2 + min{ci/6j | i = 0, • • • , n}\\z — z* || 2 < \\z+ — z* || 2 + ^ a^Ci < ||js: — z*\\ 2 < ||g — 2 :* || 2 . 

2=0 


Thus, 


||jz - * - — 2 * || < (1 — min{cj/ 6 j | i = 0 , • • • , n }) 1 / 2 ||z — z*\\. 

In each case the terms a^Ci will be taken from the left hand side of Equation (D.33), and the terms afti will be taken from the 
right of the same equation. 

Part 1: We use the first upper bound in Equation (D.33) and set ao = \\xb ~ #*|| 2 ,co = 2j\pA, and bo = (1 + 7 Z/B) 2 . 
Part 2: We use the second upper bound in Equation (D.33) and set ao = \\xa ~ x *\\ 2 -> c o = = 3(1 + 7 La) 2 -, 

ai = || Cxb — CAr*|| 2 , ci = r y\{2j3 — 7 /e), 61 = 37 2 , a 2 = ||;cb — xa\\, c 2 = A 2 (1/(Aq) — 1), and 62 = 12. 

Part 3: We use the third upper bound in Equation (D.33) and set ao = \\xa — #*|| 2 ,co = ^iXpAi^o — 3(1 + qL^) 2 , 
ai = ||xa — xb || 2 , ci = A 2 (1/(Aq:) — 1), and b\ = 3(1 + 27 2 L 2 B ). 

Part 4: We use the fourth upper bound in Equation (D.33) and set ao = \\xb — £*|| 2 , co = 27 A pb, bo — 4(1 + 27 2 L^), a-i = 
|| Cx B - Ct*|| 2 ,ci = 7 A( 2/3 - 7 /e), 61 = 47 2 ,a 2 = \\x B - xa\\ 2 ,C 2 = A 2 (1/(Aq:) - 1),62 = 4(1 + 27 2 L 2 A ). 

Part 5: We use the fourth upper bound in Equation (D.33) and set ao = ||:cb — cc*|| 2 ,co = 2 r y\pc(l ~ v)^o — 4(1 + 
2q 2 L 2 l ),ai = || Cx B - Cx* || 2 ,ci = 7^( 2 ^ - 7/e)>&i = 47 2 ,a 2 = \\x B - x A \\ 2 ,c 2 = A 2 (l/(Aa) - 1), 6 2 = 4(1 + 27 2 L 2 A ). 

Part 6 : We use the first upper bound in Equation (D.33) and set ao = \\x B — x*\\ 2 , co = 2j\pc(l ~ ??), aud ^0 = (1 + 7 -£/b) 2 - 

□ 

Remark D.2 Note that the contraction factors can be improved whenever A or B are known to be subdifferential operators of 
convex functions because the function Q.(*, •) can be made larger with Proposition D.l. We do not pursue this here due to lack 
of space. 

Remark D.3 Note that we can relax the conditions of Theorem D. 6 . Indeed, we only need to assume that C is Lipschitz to 
derive linear convergence, not necessarily cocoercive. We do not pursue this extension here due to lack of space. 


D.6 Arbitrarily slow convergence when ftcR-A > 0. 

This section shows that the result of Theorem D .6 cannot be improved in the sense that we cannot expect linear convergence 
even if C and A are strongly monotone. The results of this section parallel similar results shown in [23, Section 6.1]. 
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The main example 


Let n = t?|(N) = R 2 © R 2 © • • •. Let Rq denote counterclockwise rotation in R 2 by 0 degrees. Let eo := (1,0) denote the 
standard unit vector, and let cq := Roe o. Suppose that [0j)j >o is a sequence of angles in (0, 7 t/ 2] such that 0i —> 0 as i — > oo. 
For all i > 0, let := cos (Of). We let 


V := R 2 e 0 © R 2 e 0 ffi 


and 


Note that [2, Section 7] proves the projection identities 


( p u)i 


cos 2 (0i) sin(0j) cos(6i) 

sin(6i) cos(Oi) sin 2 (0j) 


U := R 2 e0 o © R 2 e0 1 © • • • . 


and 


(Pv)i = 


1 0 
0 0 


(D.34) 


We now begin our extension of this example. Choose a > 0 and set f = i\j + (a/2)|| • || 2 , g = iy, and h = (1/2)|| • || 2 . Set 
A = df,B = dg and C = Vh. Note that ph = 1 and (if = a. Thus, V/i is 1-Lipschitz, and, hence, (3 = 1 and we can choose 
7 = 1 < 2/3. Therefore, a = 2/9/(4/3 — 7 ) = 2/3, so we can choose A& = 1 < 1/a. We also note that prox 7 y = (1/(1 + a))Pu. 
For all i > 0, we have 


T ■ = 

-L 1 ■ - 


—-(JV)i(2(JV)i - 

a + 1 

7 r 2 

, -1 ( p u)i 
a + 1 

0 0 ' 
0 -2 

+ 

0 O' 
0 1 

1 |"0 — 2 sin(0j) cos(6i] 


CL 1 


0 —2 sin 2 (0j) + a + 1 


(Pv)i 


where T = operator defined in Equation (1.3). Note that for all i > 0, the operator (T)i has eigenvector 


/ 2 cos(^) sin(^) \ 

V 1 + a — 2 sin 2 (0i) ’ / 


with eigenvalue bi := (a — 2(1 — c^) 2 + l)/(a + 1). Each component also has the eigenvector (1, 0) with eigenvalue 0. Thus, the 
only fixed point of T is 0 G R. Finally, we note that 


4^(1 -a 


(1 + a - 2(1 - Ci) 2 ) 2 


+ 1. 


(D.35) 


Slow convergence proofs 

Part 2 of Theorem 3.1 shows that 1 — z k 0. The following result is a consequence of [3, Proposition 5.27]. 

Lemma D.2 (Strong convergence) Any sequence (z J )j>0 C R generated by Algortihm 1 converges strongly to 0. 

The next Lemma appeared in [24, Lemma 6]. 

Lemma D.3 (Arbitrarily slow sequence convergence) Suppose that F : R_|_ —»• (0, 1) is a function that is monotonically 
decreasing to zero. Then there exists a monotonic sequence {bj)j> o C (0,1) such that b^ —>■ 1~ as k —»■ oo and an increasing 
sequence of integers ( r ij)j>o C N U {0} such that for all k > 0, 

6 fc+i 

nk > F(k + l)e~ 1 . (D.36) 

n k + l 

The following is a simple corollary of Lemma D.3; The lemma first appeared in [23, Section 6.1]. 

Corollary D.10 Let the notation be as in Lemma D.3. Then for all rj E (0,1), we can find a sequence (bj)j> o C ( 77 ,1) that 
satisfies the conditions of the lemma. 

We are now ready to show that FDRS can converge arbitrarily slowly. 
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Theorem D.T (Arbitrarily slow convergence of (1.4)) For every function F : R+ —> (0,1) that strictly decreases to zero, 
there is a point z° E ^(N) anc ^ ^ w0 c ^ ose( ^ subspaces U and V with zero intersection, U D V = {0}, such that sequence (z^)j >o 
generated by Equation (1.4) applied to the functions f = xu + (a/2)|| • || 2 and g = (1/2)|| • \\ 2 , relaxation parameters A*. = 1, 
and step size 7 = 1 satisfies the following bound: 

II z k - 2*11 > e~ 1 F{k), 


but (|| z 3 — z* ||)j>o converges to 0 . 

Proof For all i > 0, define = (l/||zi||(i + l))zi, then ||z?|| = l/(i + 1) and is an eigenvector of (T)i with eigenvalue bi = 
(a— 2(1 — Cj) 2 + l)/(a+l). Define the concatenated vector Z° = (*?)i> 0 - Note that z° E because || 2 0 || 2 = XlSo l/(*“l - !) 2 < 00 . 
Thus, for all k > 0, we let z k+1 = Tz k . 

Now, recall that 2 :* = 0. Thus, for all n > 0 and k > 0, we have 


_ 11 rj~ i« 


z 0f = ^2 b 2(k+l)\ 

i—0 


2 °ll 2 = v 


, 2 (fc+l) , 2 (fc+l) 


feS(i + l) 2 “ (n + 1) 


2 ' 


Thus, ||z fe — 2 *|| > bfn+^/in + 1). To get the lower bound, we choose b n and the sequence ( n j)j>0 using Corollary D.10 with 
any 77 E (max{0, (a — 1 )/(a + 1)}, 1). Then we solve for the coefficients: c n = 1 - V(o + l)(l- 6 n )/2 > 0 . □ 

Remark D.f. Theorems D.7 and D.9 show that the sequence (z J )j> 0 can converge arbitrarily slowly even if 0 and {x J h )j>o 

converge with rate o(l/y/k + 1 ). 







